Tourism.csv - raw data that is used in this project.
CustomerID: Unique customer IDProdTaken: Whether the customer has purchased a package or not (0: No, 1: Yes)Age: Age of customerTypeofContact: How customer was contacted (Company Invited or Self Inquiry)CityTier: City tier depends on the development of a city, population, facilities, and living standards. The categories are ordered i.e. Tier 1 > Tier 2 > Tier 3Occupation: Occupation of customerGender: Gender of customerNumberOfPersonVisiting: Total number of persons planning to take the trip with the customerPreferredPropertyStar: Preferred hotel property rating by customerMaritalStatus: Marital status of customerNumberOfTrips: Average number of trips in a year by customerPassport: The customer has a passport or not (0: No, 1: Yes)OwnCar: Whether the customers own a car or not (0: No, 1: Yes)NumberOfChildrenVisiting: Total number of children with age less than 5 planning to take the trip with the customerDesignation: Designation of the customer in the current organizationMonthlyIncome: Gross monthly income of the customerPitchSatisfactionScore: Sales pitch satisfaction scoreProductPitched: Product pitched by the salespersonNumberOfFollowups: Total number of follow-ups has been done by the salesperson after the sales pitchDurationOfPitch: Duration of the pitch by a salesperson to the customer# This will help in making the python code more structured automatically
%load_ext nb_black
# Library to suppress warnings or deprecation notes
import warnings
warnings.filterwarnings("ignore")
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# Library to split data into training and testing set
from sklearn.model_selection import train_test_split
# Libraries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Resize the picture
plt.rc("figure", figsize=[10, 6])
# Remove the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Set the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# ----- Packages to build the models ------
# Library to get different metric scores
from sklearn import metrics
# Library to tune model
from sklearn.model_selection import GridSearchCV
# Library for Bagging classifier
from sklearn.ensemble import BaggingClassifier
# Library for Random Forest classifier
from sklearn.ensemble import RandomForestClassifier
# Library for Decision Tree classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# Libraries for AdaBoost, GradientBoost, Stacking
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
from sklearn.ensemble import StackingClassifier
# Library for XGBoost classifier
from xgboost import XGBClassifier
tourism = pd.read_excel("Tourism.xlsx", sheet_name="Tourism")
# copy the data to another variable to keep the original data
data = tourism.copy()
data.head()
| CustomerID | ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 200000 | 1 | 41.0 | Self Enquiry | 3 | 6.0 | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 2 | 1 | 0.0 | Manager | 20993.0 |
| 1 | 200001 | 0 | 49.0 | Company Invited | 1 | 14.0 | Salaried | Male | 3 | 4.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 3 | 1 | 2.0 | Manager | 20130.0 |
| 2 | 200002 | 1 | 37.0 | Self Enquiry | 1 | 8.0 | Free Lancer | Male | 3 | 4.0 | Basic | 3.0 | Single | 7.0 | 1 | 3 | 0 | 0.0 | Executive | 17090.0 |
| 3 | 200003 | 0 | 33.0 | Company Invited | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Divorced | 2.0 | 1 | 5 | 1 | 1.0 | Executive | 17909.0 |
| 4 | 200004 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 5 | 1 | 0.0 | Executive | 18468.0 |
data.shape
(4888, 20)
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4888 entries, 0 to 4887 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 CustomerID 4888 non-null int64 1 ProdTaken 4888 non-null int64 2 Age 4662 non-null float64 3 TypeofContact 4863 non-null object 4 CityTier 4888 non-null int64 5 DurationOfPitch 4637 non-null float64 6 Occupation 4888 non-null object 7 Gender 4888 non-null object 8 NumberOfPersonVisiting 4888 non-null int64 9 NumberOfFollowups 4843 non-null float64 10 ProductPitched 4888 non-null object 11 PreferredPropertyStar 4862 non-null float64 12 MaritalStatus 4888 non-null object 13 NumberOfTrips 4748 non-null float64 14 Passport 4888 non-null int64 15 PitchSatisfactionScore 4888 non-null int64 16 OwnCar 4888 non-null int64 17 NumberOfChildrenVisiting 4822 non-null float64 18 Designation 4888 non-null object 19 MonthlyIncome 4655 non-null float64 dtypes: float64(7), int64(7), object(6) memory usage: 763.9+ KB
# Get the variables that are object types:
object_variables = data.select_dtypes(["object"])
object_variables.columns
Index(['TypeofContact', 'Occupation', 'Gender', 'ProductPitched',
'MaritalStatus', 'Designation'],
dtype='object')
# Convert to category
for col in object_variables.columns:
data[col] = data[col].astype("category")
# Check the data
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4888 entries, 0 to 4887 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 CustomerID 4888 non-null int64 1 ProdTaken 4888 non-null int64 2 Age 4662 non-null float64 3 TypeofContact 4863 non-null category 4 CityTier 4888 non-null int64 5 DurationOfPitch 4637 non-null float64 6 Occupation 4888 non-null category 7 Gender 4888 non-null category 8 NumberOfPersonVisiting 4888 non-null int64 9 NumberOfFollowups 4843 non-null float64 10 ProductPitched 4888 non-null category 11 PreferredPropertyStar 4862 non-null float64 12 MaritalStatus 4888 non-null category 13 NumberOfTrips 4748 non-null float64 14 Passport 4888 non-null int64 15 PitchSatisfactionScore 4888 non-null int64 16 OwnCar 4888 non-null int64 17 NumberOfChildrenVisiting 4822 non-null float64 18 Designation 4888 non-null category 19 MonthlyIncome 4655 non-null float64 dtypes: category(6), float64(7), int64(7) memory usage: 564.4 KB
data.drop(["CustomerID"], axis=1, inplace=True)
data.isnull().sum()
ProdTaken 0 Age 226 TypeofContact 25 CityTier 0 DurationOfPitch 251 Occupation 0 Gender 0 NumberOfPersonVisiting 0 NumberOfFollowups 45 ProductPitched 0 PreferredPropertyStar 26 MaritalStatus 0 NumberOfTrips 140 Passport 0 PitchSatisfactionScore 0 OwnCar 0 NumberOfChildrenVisiting 66 Designation 0 MonthlyIncome 233 dtype: int64
data.duplicated().sum()
141
data[data.duplicated(keep=False) == True]
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 44 | 0 | NaN | Company Invited | 1 | 6.0 | Small Business | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 0.0 | Manager | NaN |
| 48 | 0 | 46.0 | Company Invited | 3 | 11.0 | Small Business | Male | 3 | 3.0 | Deluxe | 3.0 | Single | 5.0 | 1 | 5 | 1 | 1.0 | Manager | 20772.0 |
| 61 | 0 | 38.0 | Company Invited | 1 | 35.0 | Salaried | Female | 2 | 3.0 | Deluxe | 3.0 | Single | 2.0 | 0 | 3 | 1 | 0.0 | Manager | 17406.0 |
| 62 | 0 | 50.0 | Self Enquiry | 1 | 13.0 | Small Business | Female | 2 | 4.0 | King | 3.0 | Married | 6.0 | 1 | 4 | 1 | 1.0 | VP | 33740.0 |
| 66 | 0 | 36.0 | Company Invited | 1 | 17.0 | Salaried | Male | 3 | 4.0 | Deluxe | 4.0 | Unmarried | 2.0 | 0 | 4 | 1 | 1.0 | Manager | 21499.0 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 4760 | 0 | 36.0 | Self Enquiry | 1 | 9.0 | Salaried | Male | 3 | 5.0 | Standard | 4.0 | Married | 4.0 | 0 | 4 | 1 | 1.0 | Senior Manager | 28952.0 |
| 4788 | 0 | 31.0 | Self Enquiry | 1 | 14.0 | Salaried | Male | 3 | 4.0 | Deluxe | 3.0 | Married | 3.0 | 0 | 5 | 1 | 2.0 | Manager | 22169.0 |
| 4789 | 0 | 45.0 | Self Enquiry | 1 | 36.0 | Salaried | Male | 3 | 4.0 | Deluxe | 3.0 | Unmarried | 3.0 | 0 | 5 | 1 | 2.0 | Manager | 23219.0 |
| 4793 | 0 | 61.0 | Self Enquiry | 3 | 14.0 | Small Business | Male | 3 | 2.0 | Deluxe | 3.0 | Married | 2.0 | 1 | 5 | 0 | 1.0 | Manager | 23898.0 |
| 4811 | 0 | 60.0 | Self Enquiry | 3 | 10.0 | Salaried | Fe Male | 3 | 5.0 | Deluxe | 3.0 | Unmarried | 7.0 | 0 | 3 | 0 | 1.0 | Manager | 23849.0 |
282 rows × 19 columns
data.drop_duplicates(inplace=True)
data.duplicated().sum()
0
data.shape
(4747, 19)
data.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| ProdTaken | 4747.0 | 0.188329 | 0.391016 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| Age | 4531.0 | 37.585522 | 9.328723 | 18.0 | 31.0 | 36.0 | 44.0 | 61.0 |
| CityTier | 4747.0 | 1.655151 | 0.917416 | 1.0 | 1.0 | 1.0 | 3.0 | 3.0 |
| DurationOfPitch | 4501.0 | 15.510998 | 8.535634 | 5.0 | 9.0 | 13.0 | 20.0 | 127.0 |
| NumberOfPersonVisiting | 4747.0 | 2.911734 | 0.724040 | 1.0 | 2.0 | 3.0 | 3.0 | 5.0 |
| NumberOfFollowups | 4703.0 | 3.705082 | 1.008677 | 1.0 | 3.0 | 4.0 | 4.0 | 6.0 |
| PreferredPropertyStar | 4721.0 | 3.583351 | 0.800351 | 3.0 | 3.0 | 3.0 | 4.0 | 5.0 |
| NumberOfTrips | 4609.0 | 3.233239 | 1.847851 | 1.0 | 2.0 | 3.0 | 4.0 | 22.0 |
| Passport | 4747.0 | 0.289657 | 0.453651 | 0.0 | 0.0 | 0.0 | 1.0 | 1.0 |
| PitchSatisfactionScore | 4747.0 | 3.051612 | 1.369584 | 1.0 | 2.0 | 3.0 | 4.0 | 5.0 |
| OwnCar | 4747.0 | 0.617653 | 0.486012 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 |
| NumberOfChildrenVisiting | 4687.0 | 1.193514 | 0.860461 | 0.0 | 1.0 | 1.0 | 2.0 | 3.0 |
| MonthlyIncome | 4523.0 | 23602.239443 | 5385.503223 | 1000.0 | 20337.0 | 22311.0 | 25535.5 | 98678.0 |
data.describe(include="category").T
| count | unique | top | freq | |
|---|---|---|---|---|
| TypeofContact | 4722 | 2 | Self Enquiry | 3350 |
| Occupation | 4747 | 4 | Salaried | 2293 |
| Gender | 4747 | 3 | Male | 2835 |
| ProductPitched | 4747 | 5 | Basic | 1800 |
| MaritalStatus | 4747 | 4 | Married | 2279 |
| Designation | 4747 | 5 | Executive | 1800 |
def generate_plot(data, feature, figsize=(10, 6), kde=True, bins=None):
"""
Description:
This is the function that generate both boxplot and histogram for any input numerical variable.
Inputs:
data: dataframe of the dataset
feature: dataframe column
figsize: size of figure (default (10,6))
kde: whether to show the density curve (default False)
bins: number of bins for histogram (default None)
Output:
Boxplot and histogram
"""
f2, (ax_box2, ax_hist2) = plt.subplots(
nrows=2,
sharex=True,
gridspec_kw={"height_ratios": (0.25, 0.75)},
figsize=figsize,
)
# This is for boxplot
sns.boxplot(data=data, x=feature, ax=ax_box2, showmeans=True, color="violet")
# This is for histogram
sns.histplot(
data=data, x=feature, kde=kde, ax=ax_hist2, bins=bins, palette="winter"
) if bins else sns.histplot(data=data, x=feature, kde=kde, ax=ax_hist2)
# Add mean to the histogram
ax_hist2.axvline(data[feature].mean(), color="green", linestyle="--")
# Add median to the histogram
ax_hist2.axvline(data[feature].median(), color="black", linestyle="-")
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 4747 entries, 0 to 4887 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ProdTaken 4747 non-null int64 1 Age 4531 non-null float64 2 TypeofContact 4722 non-null category 3 CityTier 4747 non-null int64 4 DurationOfPitch 4501 non-null float64 5 Occupation 4747 non-null category 6 Gender 4747 non-null category 7 NumberOfPersonVisiting 4747 non-null int64 8 NumberOfFollowups 4703 non-null float64 9 ProductPitched 4747 non-null category 10 PreferredPropertyStar 4721 non-null float64 11 MaritalStatus 4747 non-null category 12 NumberOfTrips 4609 non-null float64 13 Passport 4747 non-null int64 14 PitchSatisfactionScore 4747 non-null int64 15 OwnCar 4747 non-null int64 16 NumberOfChildrenVisiting 4687 non-null float64 17 Designation 4747 non-null category 18 MonthlyIncome 4523 non-null float64 dtypes: category(6), float64(7), int64(6) memory usage: 548.0 KB
generate_plot(data, "Age")
generate_plot(data, "CityTier")
generate_plot(data, "DurationOfPitch")
generate_plot(data, "NumberOfPersonVisiting")
generate_plot(data, "NumberOfFollowups")
generate_plot(data, "PreferredPropertyStar")
generate_plot(data, "NumberOfTrips")
generate_plot(data, "Passport")
generate_plot(data, "PitchSatisfactionScore")
generate_plot(data, "OwnCar")
generate_plot(data, "NumberOfChildrenVisiting")
generate_plot(data, "MonthlyIncome")
def count_statistic(dataframe, feature):
'''
Description:
This is a function to count the values of each type in each variable, and also do the percentage of each type.
Inputs:
dataframe - the dataset
feature - the column name
Output:
Count of each type and percentage
'''
count_values = dataframe[feature].value_counts()
print('Counting:')
print(count_values)
print('\n')
print('Population proportion:')
print(count_values/count_values.sum())
def generate_countplot(data, feature):
"""
Description:
This is a function to do countplot
Inputs:
data - the dataset
feature - the column name
Output:
The count plot
"""
sns.countplot(data=data, x=feature)
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 4747 entries, 0 to 4887 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ProdTaken 4747 non-null int64 1 Age 4531 non-null float64 2 TypeofContact 4722 non-null category 3 CityTier 4747 non-null int64 4 DurationOfPitch 4501 non-null float64 5 Occupation 4747 non-null category 6 Gender 4747 non-null category 7 NumberOfPersonVisiting 4747 non-null int64 8 NumberOfFollowups 4703 non-null float64 9 ProductPitched 4747 non-null category 10 PreferredPropertyStar 4721 non-null float64 11 MaritalStatus 4747 non-null category 12 NumberOfTrips 4609 non-null float64 13 Passport 4747 non-null int64 14 PitchSatisfactionScore 4747 non-null int64 15 OwnCar 4747 non-null int64 16 NumberOfChildrenVisiting 4687 non-null float64 17 Designation 4747 non-null category 18 MonthlyIncome 4523 non-null float64 dtypes: category(6), float64(7), int64(6) memory usage: 548.0 KB
count_statistic(data, "ProdTaken")
Counting: 0 3853 1 894 Name: ProdTaken, dtype: int64 Population proportion: 0 0.811671 1 0.188329 Name: ProdTaken, dtype: float64
generate_countplot(data, "ProdTaken")
count_statistic(data, "TypeofContact")
Counting: Self Enquiry 3350 Company Invited 1372 Name: TypeofContact, dtype: int64 Population proportion: Self Enquiry 0.709445 Company Invited 0.290555 Name: TypeofContact, dtype: float64
generate_countplot(data, "TypeofContact")
count_statistic(data, "Occupation")
Counting: Salaried 2293 Small Business 2028 Large Business 424 Free Lancer 2 Name: Occupation, dtype: int64 Population proportion: Salaried 0.483042 Small Business 0.427217 Large Business 0.089320 Free Lancer 0.000421 Name: Occupation, dtype: float64
generate_countplot(data, "Occupation")
count_statistic(data, "Gender")
Counting: Male 2835 Female 1769 Fe Male 143 Name: Gender, dtype: int64 Population proportion: Male 0.597219 Female 0.372656 Fe Male 0.030124 Name: Gender, dtype: float64
generate_countplot(data, "Gender")
count_statistic(data, "ProductPitched")
Counting: Basic 1800 Deluxe 1684 Standard 714 Super Deluxe 324 King 225 Name: ProductPitched, dtype: int64 Population proportion: Basic 0.379187 Deluxe 0.354750 Standard 0.150411 Super Deluxe 0.068254 King 0.047398 Name: ProductPitched, dtype: float64
generate_countplot(data, "ProductPitched")
count_statistic(data, "MaritalStatus")
Counting: Married 2279 Divorced 950 Single 875 Unmarried 643 Name: MaritalStatus, dtype: int64 Population proportion: Married 0.480093 Divorced 0.200126 Single 0.184327 Unmarried 0.135454 Name: MaritalStatus, dtype: float64
generate_countplot(data, "MaritalStatus")
count_statistic(data, "Designation")
Counting: Executive 1800 Manager 1684 Senior Manager 714 AVP 324 VP 225 Name: Designation, dtype: int64 Population proportion: Executive 0.379187 Manager 0.354750 Senior Manager 0.150411 AVP 0.068254 VP 0.047398 Name: Designation, dtype: float64
generate_countplot(data, "Designation")
sns.pairplot(data, diag_kind="kde", hue="ProdTaken")
<seaborn.axisgrid.PairGrid at 0x23514057610>
# 2-D matrix:
correlation = data.corr()
correlation
| ProdTaken | Age | CityTier | DurationOfPitch | NumberOfPersonVisiting | NumberOfFollowups | PreferredPropertyStar | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ProdTaken | 1.000000 | -0.146835 | 0.087104 | 0.074869 | 0.010352 | 0.115001 | 0.096168 | 0.026652 | 0.262567 | 0.052667 | -0.011289 | 0.009087 | -0.132070 |
| Age | -0.146835 | 1.000000 | -0.018870 | -0.012678 | 0.011708 | -0.002524 | -0.010465 | 0.178912 | 0.030165 | 0.014525 | 0.047647 | 0.005329 | 0.465681 |
| CityTier | 0.087104 | -0.018870 | 1.000000 | 0.022650 | -0.004280 | 0.027920 | -0.010565 | -0.037776 | -0.002446 | -0.043015 | 0.004300 | 0.001755 | 0.050131 |
| DurationOfPitch | 0.074869 | -0.012678 | 0.022650 | 1.000000 | 0.062663 | 0.010032 | -0.005490 | 0.007877 | 0.033907 | -0.001946 | -0.000482 | 0.029056 | -0.006716 |
| NumberOfPersonVisiting | 0.010352 | 0.011708 | -0.004280 | 0.062663 | 1.000000 | 0.328980 | 0.036227 | 0.194561 | 0.013065 | -0.014741 | 0.012453 | 0.609440 | 0.193832 |
| NumberOfFollowups | 0.115001 | -0.002524 | 0.027920 | 0.010032 | 0.328980 | 1.000000 | -0.024625 | 0.138336 | 0.008428 | 0.002003 | 0.010040 | 0.284572 | 0.172677 |
| PreferredPropertyStar | 0.096168 | -0.010465 | -0.010565 | -0.005490 | 0.036227 | -0.024625 | 1.000000 | 0.014035 | -0.004682 | -0.018786 | 0.017434 | 0.036860 | 0.015494 |
| NumberOfTrips | 0.026652 | 0.178912 | -0.037776 | 0.007877 | 0.194561 | 0.138336 | 0.014035 | 1.000000 | 0.014727 | -0.005666 | -0.016342 | 0.168567 | 0.137790 |
| Passport | 0.262567 | 0.030165 | -0.002446 | 0.033907 | 0.013065 | 0.008428 | -0.004682 | 0.014727 | 1.000000 | -0.003041 | -0.020330 | 0.015857 | 0.002805 |
| PitchSatisfactionScore | 0.052667 | 0.014525 | -0.043015 | -0.001946 | -0.014741 | 0.002003 | -0.018786 | -0.005666 | -0.003041 | 1.000000 | 0.070803 | 0.001386 | 0.030190 |
| OwnCar | -0.011289 | 0.047647 | 0.004300 | -0.000482 | 0.012453 | 0.010040 | 0.017434 | -0.016342 | -0.020330 | 0.070803 | 1.000000 | 0.028208 | 0.077672 |
| NumberOfChildrenVisiting | 0.009087 | 0.005329 | 0.001755 | 0.029056 | 0.609440 | 0.284572 | 0.036860 | 0.168567 | 0.015857 | 0.001386 | 0.028208 | 1.000000 | 0.199233 |
| MonthlyIncome | -0.132070 | 0.465681 | 0.050131 | -0.006716 | 0.193832 | 0.172677 | 0.015494 | 0.137790 | 0.002805 | 0.030190 | 0.077672 | 0.199233 | 1.000000 |
# correlation heatmap:
plt.figure(figsize=(20, 10))
sns.heatmap(correlation, annot=True, vmin=-1, vmax=1, fmt=".2f", cmap="Spectral")
<AxesSubplot:>
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 4747 entries, 0 to 4887 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ProdTaken 4747 non-null int64 1 Age 4531 non-null float64 2 TypeofContact 4722 non-null category 3 CityTier 4747 non-null int64 4 DurationOfPitch 4501 non-null float64 5 Occupation 4747 non-null category 6 Gender 4747 non-null category 7 NumberOfPersonVisiting 4747 non-null int64 8 NumberOfFollowups 4703 non-null float64 9 ProductPitched 4747 non-null category 10 PreferredPropertyStar 4721 non-null float64 11 MaritalStatus 4747 non-null category 12 NumberOfTrips 4609 non-null float64 13 Passport 4747 non-null int64 14 PitchSatisfactionScore 4747 non-null int64 15 OwnCar 4747 non-null int64 16 NumberOfChildrenVisiting 4687 non-null float64 17 Designation 4747 non-null category 18 MonthlyIncome 4523 non-null float64 dtypes: category(6), float64(7), int64(6) memory usage: 708.0 KB
# Create a function to do stacked plot:
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(20, 6))
plt.legend(
loc="lower left",
frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
stacked_barplot(data, "Age", "ProdTaken")
ProdTaken 0 1 All Age All 3679 852 4531 29.0 120 57 177 30.0 146 47 193 31.0 149 40 189 32.0 150 40 190 34.0 164 39 203 33.0 145 37 182 26.0 68 36 104 35.0 199 32 231 27.0 106 29 135 28.0 117 26 143 20.0 13 25 38 36.0 198 25 223 37.0 158 24 182 21.0 18 23 41 41.0 127 23 150 40.0 121 22 143 19.0 11 21 32 25.0 53 20 73 42.0 119 18 137 24.0 38 18 56 51.0 71 17 88 45.0 93 17 110 44.0 83 16 99 22.0 31 15 46 38.0 157 15 172 52.0 54 14 68 39.0 135 13 148 23.0 33 13 46 59.0 30 12 42 56.0 43 12 55 47.0 75 12 87 50.0 72 12 84 46.0 106 11 117 48.0 53 11 64 58.0 19 11 30 49.0 56 9 65 43.0 116 9 125 53.0 56 8 64 18.0 6 8 14 55.0 56 7 63 57.0 23 5 28 54.0 57 2 59 60.0 26 1 27 61.0 8 0 8 ------------------------------------------------------------------------------------------------------------------------
plt.figure(figsize=(20, 10))
sns.pointplot(x="Age", y="ProdTaken", data=data, estimator=sum, ci=None)
<AxesSubplot:xlabel='Age', ylabel='ProdTaken'>
stacked_barplot(data, "TypeofContact", "ProdTaken")
ProdTaken 0 1 All TypeofContact All 3831 891 4722 Self Enquiry 2753 597 3350 Company Invited 1078 294 1372 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "CityTier", "ProdTaken")
ProdTaken 0 1 All CityTier All 3853 894 4747 1 2592 506 3098 3 1115 346 1461 2 146 42 188 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "DurationOfPitch", "ProdTaken")
ProdTaken 0 1 All DurationOfPitch All 3651 850 4501 9.0 388 78 466 8.0 265 59 324 16.0 215 55 270 15.0 210 52 262 7.0 287 47 334 10.0 191 43 234 14.0 204 41 245 6.0 261 38 299 11.0 162 34 196 12.0 155 32 187 13.0 181 32 213 31.0 50 30 80 30.0 62 28 90 17.0 146 23 169 22.0 66 22 88 19.0 35 20 55 29.0 51 20 71 23.0 58 19 77 18.0 55 18 73 20.0 43 18 61 28.0 44 17 61 32.0 57 15 72 21.0 55 15 70 24.0 54 15 69 27.0 58 14 72 25.0 58 14 72 26.0 59 12 71 33.0 45 11 56 35.0 54 11 65 36.0 32 9 41 34.0 42 8 50 126.0 1 0 1 127.0 1 0 1 5.0 6 0 6 ------------------------------------------------------------------------------------------------------------------------
sns.pointplot(x="DurationOfPitch", y="ProdTaken", data=data, estimator=sum, ci=None)
<AxesSubplot:xlabel='DurationOfPitch', ylabel='ProdTaken'>
stacked_barplot(data, "Occupation", "ProdTaken")
ProdTaken 0 1 All Occupation All 3853 894 4747 Salaried 1893 400 2293 Small Business 1654 374 2028 Large Business 306 118 424 Free Lancer 0 2 2 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "Gender", "ProdTaken")
ProdTaken 0 1 All Gender All 3853 894 4747 Male 2273 562 2835 Female 1461 308 1769 Fe Male 119 24 143 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "NumberOfPersonVisiting", "ProdTaken")
ProdTaken 0 1 All NumberOfPersonVisiting All 3853 894 4747 3 1889 447 2336 2 1108 256 1364 4 818 191 1009 1 35 0 35 5 3 0 3 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "NumberOfFollowups", "ProdTaken")
ProdTaken 0 1 All NumberOfFollowups All 3817 886 4703 4.0 1632 367 1999 3.0 1187 234 1421 5.0 557 188 745 6.0 82 53 135 2.0 204 24 228 1.0 155 20 175 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "ProductPitched", "ProdTaken")
ProdTaken 0 1 All ProductPitched All 3853 894 4747 Basic 1260 540 1800 Deluxe 1486 198 1684 Standard 594 120 714 King 205 20 225 Super Deluxe 308 16 324 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "PreferredPropertyStar", "ProdTaken")
ProdTaken 0 1 All PreferredPropertyStar All 3833 888 4721 3.0 2435 470 2905 5.0 696 242 938 4.0 702 176 878 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "MaritalStatus", "ProdTaken")
ProdTaken 0 1 All MaritalStatus All 3853 894 4747 Married 1965 314 2279 Single 578 297 875 Unmarried 484 159 643 Divorced 826 124 950 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "NumberOfTrips", "ProdTaken")
ProdTaken 0 1 All NumberOfTrips All 3727 882 4609 2.0 1134 288 1422 3.0 838 213 1051 1.0 496 105 601 6.0 244 63 307 5.0 382 61 443 7.0 150 61 211 4.0 408 60 468 8.0 73 29 102 19.0 0 1 1 20.0 0 1 1 21.0 1 0 1 22.0 1 0 1 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "Passport", "ProdTaken")
ProdTaken 0 1 All Passport All 3853 894 4747 1 895 480 1375 0 2958 414 3372 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "PitchSatisfactionScore", "ProdTaken")
ProdTaken 0 1 All PitchSatisfactionScore All 3853 894 4747 3 1123 304 1427 5 725 198 923 4 709 160 869 1 798 144 942 2 498 88 586 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "OwnCar", "ProdTaken")
ProdTaken 0 1 All OwnCar All 3853 894 4747 1 2390 542 2932 0 1463 352 1815 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "NumberOfChildrenVisiting", "ProdTaken")
ProdTaken 0 1 All NumberOfChildrenVisiting All 3800 887 4687 1.0 1635 379 2014 2.0 1056 248 1304 0.0 851 194 1045 3.0 258 66 324 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "Designation", "ProdTaken")
ProdTaken 0 1 All Designation All 3853 894 4747 Executive 1260 540 1800 Manager 1486 198 1684 Senior Manager 594 120 714 VP 205 20 225 AVP 308 16 324 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "MonthlyIncome", "ProdTaken")
ProdTaken 0 1 All MonthlyIncome All 3660 863 4523 21082.0 1 4 5 17293.0 0 4 4 20971.0 0 4 4 17404.0 1 4 5 ... ... ... ... 21534.0 1 0 1 21524.0 2 0 2 21522.0 2 0 2 21515.0 1 0 1 22664.0 1 0 1 [2476 rows x 3 columns] ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, "MonthlyIncome", "ProdTaken")
ProdTaken 0 1 All MonthlyIncome All 3660 863 4523 21082.0 1 4 5 17293.0 0 4 4 20971.0 0 4 4 17404.0 1 4 5 ... ... ... ... 21534.0 1 0 1 21524.0 2 0 2 21522.0 2 0 2 21515.0 1 0 1 22664.0 1 0 1 [2476 rows x 3 columns] ------------------------------------------------------------------------------------------------------------------------
sns.pointplot(x="MonthlyIncome", y="ProdTaken", data=data, estimator=sum, ci=None)
<AxesSubplot:xlabel='MonthlyIncome', ylabel='ProdTaken'>
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 4747 entries, 0 to 4887 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ProdTaken 4747 non-null int64 1 Age 4531 non-null float64 2 TypeofContact 4722 non-null category 3 CityTier 4747 non-null int64 4 DurationOfPitch 4501 non-null float64 5 Occupation 4747 non-null category 6 Gender 4747 non-null category 7 NumberOfPersonVisiting 4747 non-null int64 8 NumberOfFollowups 4703 non-null float64 9 ProductPitched 4747 non-null category 10 PreferredPropertyStar 4721 non-null float64 11 MaritalStatus 4747 non-null category 12 NumberOfTrips 4609 non-null float64 13 Passport 4747 non-null int64 14 PitchSatisfactionScore 4747 non-null int64 15 OwnCar 4747 non-null int64 16 NumberOfChildrenVisiting 4687 non-null float64 17 Designation 4747 non-null category 18 MonthlyIncome 4523 non-null float64 dtypes: category(6), float64(7), int64(6) memory usage: 708.0 KB
Data Description
Univariate Data Analysis
Age: This is right-skewed distribution. There are no outliers in Age variable.CityTier: There are no outliers in CityTier variable. The median is 1, and the mean is around 1.65.DurationOfPitch: There are some outliers in DurationOfPitch variable. This is slightly right skwewed distribution.NumberOfPersonVisiting: There are some outliers in NumberOfPersonVisiting variable.NumberOfFollowups: There are outliers in NumberOfFollowups variable.PreferredPropertyStar: There are no outliers in PreferredPropertyStar variable.NumberOfTrips: There are some outliers in NumberOfTrips variable.Passport: There are no outliers in Passport variable.PitchSatisfactionScore: There are no outliers in PitchSatisfactionScore variable. The median is 3, and the mean is around 3.1.OwnCar: There are no outliers in OwnCar variable.NumberOfChildrenVisiting: There are no outliers in NumberOfChildrenVisiting variable.MonthlyIncome: There are outliers in MonthlyIncome variable.ProdTaken: There is about 81.18% of customers that didn't purchase travel package.TypeofContact: There are about 3444 customers, which is 70.82% that are self enquiry.Occupation: Most customers, about 48.44% are salaried, followed by 42.64% customers that have small businesses.There is very little percentage of free lancer.Gender: There is a typo in Fe Male. We will need to fix this later. There are more male than female in this dataset.ProductPitched: Most customers, about 37.68%, choose basic package. The package that has the least customers, is King with 4.71%.MaritalStatus: Most customers, about 47.87%, are married, and 13.95% are unmarried.Designation: Most customers in this dataset is Executive with 37.68%. The desgination that the least is VP with 4.71%.Bivariate Data Analysis
ProdTaken vs Age:
ProdTaken vs TypeofContact:
ProdTaken vs CityTier:
ProdTaken vs DurationOfPitch:
ProdTaken vs Occupation:
ProdTaken vs Gender:
ProdTaken vs NumberOfPersonVisiting:
ProdTaken vs NumberOfFollowups:
ProdTaken vs ProductPitched:
ProdTaken vs PreferredPropertyStar:
ProdTaken vs MaritalStatus:
ProdTaken vs NumberOfTrips:
ProdTake vs Passport:
ProdTaken vs PitchSatisfactionScore:
ProdTaken vs OwnCar:
ProdTaken vs NumberOfChildrenVisiting:
ProdTaken vs Designation:
ProdTaken vs MonthlyIncome:
data.head()
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 41.0 | Self Enquiry | 3 | 6.0 | Salaried | Female | 3 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 2 | 1 | 0.0 | Manager | 20993.0 |
| 1 | 0 | 49.0 | Company Invited | 1 | 14.0 | Salaried | Male | 3 | 4.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 3 | 1 | 2.0 | Manager | 20130.0 |
| 2 | 1 | 37.0 | Self Enquiry | 1 | 8.0 | Free Lancer | Male | 3 | 4.0 | Basic | 3.0 | Single | 7.0 | 1 | 3 | 0 | 0.0 | Executive | 17090.0 |
| 3 | 0 | 33.0 | Company Invited | 1 | 9.0 | Salaried | Female | 2 | 3.0 | Basic | 3.0 | Divorced | 2.0 | 1 | 5 | 1 | 1.0 | Executive | 17909.0 |
| 4 | 0 | NaN | Self Enquiry | 1 | 8.0 | Small Business | Male | 2 | 3.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 5 | 1 | 0.0 | Executive | 18468.0 |
data.isnull().sum().sort_values(ascending=False)
DurationOfPitch 246 MonthlyIncome 224 Age 216 NumberOfTrips 138 NumberOfChildrenVisiting 60 NumberOfFollowups 44 PreferredPropertyStar 26 TypeofContact 25 Gender 0 CityTier 0 Occupation 0 ProductPitched 0 NumberOfPersonVisiting 0 Designation 0 MaritalStatus 0 Passport 0 PitchSatisfactionScore 0 OwnCar 0 ProdTaken 0 dtype: int64
# Create function to calculate mean and median
def get_stats(data, column):
"""
Description:
This is a function to get mean and median values for a feature.
Inputs:
data - the dataset
column - the column name
Output:
The mean and median values
"""
print(f"The mean of {column} column is {round(data[column].mean(), 3)}")
print(f"The median of {column} column is {round(data[column].median(), 3)}")
print("-" * 20)
# Apply function to the columns, except TypeofContact column
target_columns = [
"DurationOfPitch",
"MonthlyIncome",
"Age",
"NumberOfTrips",
"NumberOfChildrenVisiting",
"NumberOfFollowups",
"PreferredPropertyStar"
]
for col in target_columns:
get_stats(data, col)
The mean of DurationOfPitch column is 15.511 The median of DurationOfPitch column is 13.0 -------------------- The mean of MonthlyIncome column is 23602.239 The median of MonthlyIncome column is 22311.0 -------------------- The mean of Age column is 37.586 The median of Age column is 36.0 -------------------- The mean of NumberOfTrips column is 3.233 The median of NumberOfTrips column is 3.0 -------------------- The mean of NumberOfChildrenVisiting column is 1.194 The median of NumberOfChildrenVisiting column is 1.0 -------------------- The mean of NumberOfFollowups column is 3.705 The median of NumberOfFollowups column is 4.0 -------------------- The mean of PreferredPropertyStar column is 3.583 The median of PreferredPropertyStar column is 3.0 --------------------
# Apply the median imputation
for col in target_columns:
data[col].fillna(data[col].median(), inplace=True)
data.isnull().sum().sort_values(ascending=False)
TypeofContact 25 MonthlyIncome 0 NumberOfFollowups 0 Age 0 CityTier 0 DurationOfPitch 0 Occupation 0 Gender 0 NumberOfPersonVisiting 0 ProductPitched 0 Designation 0 PreferredPropertyStar 0 MaritalStatus 0 NumberOfTrips 0 Passport 0 PitchSatisfactionScore 0 OwnCar 0 NumberOfChildrenVisiting 0 ProdTaken 0 dtype: int64
# Since this is a categorical column, we need to use cat.add_categories to add categories first:
data["TypeofContact"] = data["TypeofContact"].cat.add_categories("Unknown")
data["TypeofContact"].fillna("Unknown", inplace=True)
data.isnull().sum().sort_values(ascending=False)
MonthlyIncome 0 NumberOfFollowups 0 Age 0 TypeofContact 0 CityTier 0 DurationOfPitch 0 Occupation 0 Gender 0 NumberOfPersonVisiting 0 ProductPitched 0 Designation 0 PreferredPropertyStar 0 MaritalStatus 0 NumberOfTrips 0 Passport 0 PitchSatisfactionScore 0 OwnCar 0 NumberOfChildrenVisiting 0 ProdTaken 0 dtype: int64
# Z-score function
outlier = []
def find_z_score(data, feature, threshold=3):
"""
Description:
This is a function to detect number of outliers.
Inputs:
data - the dataset
feature - column name
threshold - value is 3 because any points that fall outside 3 standard deviation is an outlier
Output:
Number of outliers in a variables
"""
mean = np.mean(data[feature])
std = np.std(data[feature])
for value in data[feature]:
z_score = (value - mean) / std
# use absolute on z score to have more accurate result
if np.abs(z_score) > threshold:
outlier.append(value)
return outlier
target_columns = [
"DurationOfPitch",
"NumberOfPersonVisiting",
"NumberOfFollowups",
"NumberOfTrips",
"MonthlyIncome",
]
# Detect number of outliers for target variables:
for column in target_columns:
outliers = find_z_score(data, column)
print("There are ", len(outliers), " outliers in ", column, " variable")
print("-" * 20)
There are 2 outliers in DurationOfPitch variable -------------------- There are 2 outliers in NumberOfPersonVisiting variable -------------------- There are 2 outliers in NumberOfFollowups variable -------------------- There are 6 outliers in NumberOfTrips variable -------------------- There are 10 outliers in MonthlyIncome variable --------------------
# IQR function
def IQR_method(data, feature):
'''
Description:
- This is a function that uses Interquartile range (IQR) method to do outlier treatment.
- Q1 is known as 25th percentile. Q3 is known as 75th percentile. IQR= Q3-Q1
- Any data points that fall outside the minimum (Q1-1.5*IQR) and maximum (Q3+1.5*IQR) are outliers.
- Hence, the data points that are less than the minimum, will be replaced with the minimum values.
- Data points that are greater than the maximum values, will be replaced with the maximum values.
Inputs:
data - the dataset
feature - column name
Output:
Updated values for outliers
'''
Q1 = data[feature].quantile(0.25)
Q3 = data[feature].quantile(0.75)
IQR = Q3-Q1
lower_range = Q1 - 1.5*IQR
upper_range = Q3 + 1.5*IQR
#replace outliers with lower range values and upper range values:
data[feature] = np.where(data[feature] < lower_range, lower_range, data[feature])
data[feature] = np.where(data[feature] > upper_range, upper_range, data[feature])
# Outlier treatment for target variables:
for column in target_columns:
IQR_method(data, column)
# Do the plots for target variables to see if the method improves the outliers:
for column in target_columns:
generate_plot(data, column)
plt.show()
data["Gender"].value_counts()
Male 2835 Female 1769 Fe Male 143 Name: Gender, dtype: int64
# Relace Fe Male to Female
data["Gender"].replace("Fe Male", "Female", inplace=True)
# Check data
data["Gender"].value_counts()
Male 2835 Female 1912 Name: Gender, dtype: int64
data.head()
| ProdTaken | Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 41.0 | Self Enquiry | 3 | 6.0 | Salaried | Female | 3.0 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 2 | 1 | 0.0 | Manager | 20993.0 |
| 1 | 0 | 49.0 | Company Invited | 1 | 14.0 | Salaried | Male | 3.0 | 4.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 3 | 1 | 2.0 | Manager | 20130.0 |
| 2 | 1 | 37.0 | Self Enquiry | 1 | 8.0 | Free Lancer | Male | 3.0 | 4.0 | Basic | 3.0 | Single | 7.0 | 1 | 3 | 0 | 0.0 | Executive | 17090.0 |
| 3 | 0 | 33.0 | Company Invited | 1 | 9.0 | Salaried | Female | 2.0 | 3.0 | Basic | 3.0 | Divorced | 2.0 | 1 | 5 | 1 | 1.0 | Executive | 17909.0 |
| 4 | 0 | 36.0 | Self Enquiry | 1 | 8.0 | Small Business | Male | 2.0 | 3.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 5 | 1 | 0.0 | Executive | 18468.0 |
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 4747 entries, 0 to 4887 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ProdTaken 4747 non-null int64 1 Age 4747 non-null float64 2 TypeofContact 4747 non-null category 3 CityTier 4747 non-null int64 4 DurationOfPitch 4747 non-null float64 5 Occupation 4747 non-null category 6 Gender 4747 non-null category 7 NumberOfPersonVisiting 4747 non-null float64 8 NumberOfFollowups 4747 non-null float64 9 ProductPitched 4747 non-null category 10 PreferredPropertyStar 4747 non-null float64 11 MaritalStatus 4747 non-null category 12 NumberOfTrips 4747 non-null float64 13 Passport 4747 non-null int64 14 PitchSatisfactionScore 4747 non-null int64 15 OwnCar 4747 non-null int64 16 NumberOfChildrenVisiting 4747 non-null float64 17 Designation 4747 non-null category 18 MonthlyIncome 4747 non-null float64 dtypes: category(6), float64(8), int64(5) memory usage: 708.0 KB
Mising Value Treament
Outliers Treament
Fix Gender Column
Model can make wrong predictions as:
Which case is more important?
Which metric to optimize?
x = data.drop("ProdTaken", axis=1)
y = data["ProdTaken"]
x.head()
| Age | TypeofContact | CityTier | DurationOfPitch | Occupation | Gender | NumberOfPersonVisiting | NumberOfFollowups | ProductPitched | PreferredPropertyStar | MaritalStatus | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | Designation | MonthlyIncome | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 41.0 | Self Enquiry | 3 | 6.0 | Salaried | Female | 3.0 | 3.0 | Deluxe | 3.0 | Single | 1.0 | 1 | 2 | 1 | 0.0 | Manager | 20993.0 |
| 1 | 49.0 | Company Invited | 1 | 14.0 | Salaried | Male | 3.0 | 4.0 | Deluxe | 4.0 | Divorced | 2.0 | 0 | 3 | 1 | 2.0 | Manager | 20130.0 |
| 2 | 37.0 | Self Enquiry | 1 | 8.0 | Free Lancer | Male | 3.0 | 4.0 | Basic | 3.0 | Single | 7.0 | 1 | 3 | 0 | 0.0 | Executive | 17090.0 |
| 3 | 33.0 | Company Invited | 1 | 9.0 | Salaried | Female | 2.0 | 3.0 | Basic | 3.0 | Divorced | 2.0 | 1 | 5 | 1 | 1.0 | Executive | 17909.0 |
| 4 | 36.0 | Self Enquiry | 1 | 8.0 | Small Business | Male | 2.0 | 3.0 | Basic | 4.0 | Divorced | 1.0 | 0 | 5 | 1 | 0.0 | Executive | 18468.0 |
columns = [
"TypeofContact",
"Occupation",
"Gender",
"ProductPitched",
"MaritalStatus",
"Designation",
]
x = pd.get_dummies(x, columns=columns, drop_first=True)
x.head()
| Age | CityTier | DurationOfPitch | NumberOfPersonVisiting | NumberOfFollowups | PreferredPropertyStar | NumberOfTrips | Passport | PitchSatisfactionScore | OwnCar | NumberOfChildrenVisiting | MonthlyIncome | TypeofContact_Self Enquiry | TypeofContact_Unknown | Occupation_Large Business | Occupation_Salaried | Occupation_Small Business | Gender_Male | ProductPitched_Deluxe | ProductPitched_King | ProductPitched_Standard | ProductPitched_Super Deluxe | MaritalStatus_Married | MaritalStatus_Single | MaritalStatus_Unmarried | Designation_Executive | Designation_Manager | Designation_Senior Manager | Designation_VP | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 41.0 | 3 | 6.0 | 3.0 | 3.0 | 3.0 | 1.0 | 1 | 2 | 1 | 0.0 | 20993.0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 |
| 1 | 49.0 | 1 | 14.0 | 3.0 | 4.0 | 4.0 | 2.0 | 0 | 3 | 1 | 2.0 | 20130.0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 2 | 37.0 | 1 | 8.0 | 3.0 | 4.0 | 3.0 | 7.0 | 1 | 3 | 0 | 0.0 | 17090.0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 |
| 3 | 33.0 | 1 | 9.0 | 2.0 | 3.0 | 3.0 | 2.0 | 1 | 5 | 1 | 1.0 | 17909.0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 4 | 36.0 | 1 | 8.0 | 2.0 | 3.0 | 4.0 | 1.0 | 0 | 5 | 1 | 0.0 | 18468.0 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size=0.3, random_state=1, stratify=y
)
x_train.shape
(3322, 29)
y_train.shape
(3322,)
x_test.shape
(1425, 29)
y_test.shape
(1425,)
y_test.value_counts()
0 1157 1 268 Name: ProdTaken, dtype: int64
#Confusion matrix function:
def confusion_matrix(model, predictor, target):
"""
Description:
This is the function to create confusion matrix and heatmap
Inputs:
model: classifier
predictor - independent variables
target - dependent variables
Outputs:
Heatmap plot with confusion matrix values
"""
prediction = model.predict(predictor)
cm = metrics.confusion_matrix(target, prediction)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
# Create a function to compute the model metrics:
def model_metrics(model, predictor, target):
"""
Description:
This is the function to compute the model metrics
Inputs:
model: classifier
predictor - independent variables
target - dependent variables
Outputs:
Model metrics
"""
# Do the prediction:
prediction = model.predict(predictor)
# Calculate the accuracy:
accuracy = metrics.accuracy_score(target, prediction)
# Calculate recall:
recall = metrics.recall_score(target, prediction)
# Calculate Precision:
precision = metrics.precision_score(target, prediction)
# Calculate F1 score:
f1 = metrics.f1_score(target, prediction)
# creating a dataframe of metrics
metrics_dataframe = pd.DataFrame(
{
"Accuracy": accuracy,
"Recall": recall,
"Precision": precision,
"F1": f1,
},
index=[0],
)
return metrics_dataframe
# Build model
bagging = BaggingClassifier(random_state=1)
bagging.fit(x_train, y_train)
BaggingClassifier(random_state=1)
# Calculate the model metrics for training dataset
bagging_metrics_train = model_metrics(bagging, x_train, y_train)
bagging_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.994281 | 0.971246 | 0.998358 | 0.984615 |
# Confusion matrix for training dataset
confusion_matrix(bagging, x_train, y_train)
# Calculate the model metrics for testing dataset
bagging_metrics_test = model_metrics(bagging, x_test, y_test)
bagging_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.90386 | 0.630597 | 0.816425 | 0.711579 |
# Confusion matrix for testing dataset
confusion_matrix(bagging, x_test, y_test)
# Build the model
random_forest = RandomForestClassifier(random_state=1)
random_forest.fit(x_train, y_train)
RandomForestClassifier(random_state=1)
# Calculate the model metrics for training dataset
rf_metrics_train = model_metrics(random_forest, x_train, y_train)
rf_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 1.0 | 1.0 | 1.0 | 1.0 |
# Confusion matrix for training dataset
confusion_matrix(random_forest, x_train, y_train)
# Calculate the model metrics for testing dataset
rf_metrics_test = model_metrics(random_forest, x_test, y_test)
rf_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.910877 | 0.597015 | 0.893855 | 0.715884 |
# Confusion matrix for testing dataset
confusion_matrix(random_forest, x_test, y_test)
# Build the model
decision_tree = DecisionTreeClassifier(random_state=1)
decision_tree.fit(x_train, y_train)
DecisionTreeClassifier(random_state=1)
# Calculate the model metrics for training dataset
tree_metrics_train = model_metrics(decision_tree, x_train, y_train)
tree_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 1.0 | 1.0 | 1.0 | 1.0 |
# Confusion matrix for training dataset
confusion_matrix(decision_tree, x_train, y_train)
# Calculate the model metrics for testing dataset
tree_metrics_test = model_metrics(decision_tree, x_test, y_test)
tree_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.893333 | 0.742537 | 0.705674 | 0.723636 |
# Confusion matrix for testing dataset
confusion_matrix(decision_tree, x_test, y_test)
feature_names = x_train.columns.tolist()
# plot the model
plt.figure(figsize=(20, 30))
output = tree.plot_tree(
decision_tree,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=False,
class_names=None,
)
# below code will add arrows to the decision tree split if they are missing
for line in output:
arrow = line.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
# Text report showing the rules of a decision tree -
print(tree.export_text(decision_tree, feature_names=feature_names, show_weights=True))
|--- Passport <= 0.50 | |--- Age <= 22.50 | | |--- PitchSatisfactionScore <= 2.50 | | | |--- weights: [18.00, 0.00] class: 0 | | |--- PitchSatisfactionScore > 2.50 | | | |--- Occupation_Large Business <= 0.50 | | | | |--- MonthlyIncome <= 21427.50 | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | |--- Gender_Male <= 0.50 | | | | | | | |--- Occupation_Salaried <= 0.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- Occupation_Salaried > 0.50 | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- Gender_Male > 0.50 | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | |--- DurationOfPitch <= 20.00 | | | | | | | |--- TypeofContact_Self Enquiry <= 0.50 | | | | | | | | |--- CityTier <= 2.00 | | | | | | | | | |--- DurationOfPitch <= 10.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- DurationOfPitch > 10.00 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | |--- CityTier > 2.00 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- TypeofContact_Self Enquiry > 0.50 | | | | | | | | |--- DurationOfPitch <= 9.50 | | | | | | | | | |--- MonthlyIncome <= 17413.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 17413.50 | | | | | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- DurationOfPitch > 9.50 | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | |--- DurationOfPitch > 20.00 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | |--- MonthlyIncome > 21427.50 | | | | | |--- weights: [6.00, 0.00] class: 0 | | | |--- Occupation_Large Business > 0.50 | | | | |--- weights: [0.00, 9.00] class: 1 | |--- Age > 22.50 | | |--- PreferredPropertyStar <= 4.50 | | | |--- NumberOfFollowups <= 5.25 | | | | |--- Occupation_Large Business <= 0.50 | | | | | |--- MonthlyIncome <= 16559.00 | | | | | | |--- MaritalStatus_Single <= 0.50 | | | | | | | |--- PitchSatisfactionScore <= 1.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- PitchSatisfactionScore > 1.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- MaritalStatus_Single > 0.50 | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | |--- MonthlyIncome > 16559.00 | | | | | | |--- ProductPitched_Standard <= 0.50 | | | | | | | |--- MonthlyIncome <= 20161.50 | | | | | | | | |--- MonthlyIncome <= 20153.50 | | | | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | | | | |--- MonthlyIncome <= 19572.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- MonthlyIncome > 19572.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | | | | |--- Gender_Male <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- Gender_Male > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- MonthlyIncome > 20153.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- MonthlyIncome > 20161.50 | | | | | | | | |--- Designation_Executive <= 0.50 | | | | | | | | | |--- MonthlyIncome <= 23250.50 | | | | | | | | | | |--- ProductPitched_Super Deluxe <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- ProductPitched_Super Deluxe > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 23250.50 | | | | | | | | | | |--- MonthlyIncome <= 23568.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- MonthlyIncome > 23568.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- Designation_Executive > 0.50 | | | | | | | | | |--- MonthlyIncome <= 24233.00 | | | | | | | | | | |--- Age <= 32.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- Age > 32.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- MonthlyIncome > 24233.00 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- ProductPitched_Standard > 0.50 | | | | | | | |--- DurationOfPitch <= 15.50 | | | | | | | | |--- MonthlyIncome <= 21584.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- MonthlyIncome > 21584.50 | | | | | | | | | |--- MonthlyIncome <= 25668.00 | | | | | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- MonthlyIncome > 25668.00 | | | | | | | | | | |--- weights: [94.00, 0.00] class: 0 | | | | | | | |--- DurationOfPitch > 15.50 | | | | | | | | |--- Age <= 43.50 | | | | | | | | | |--- NumberOfTrips <= 6.50 | | | | | | | | | | |--- Age <= 31.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- Age > 31.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- NumberOfTrips > 6.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- Age > 43.50 | | | | | | | | | |--- NumberOfTrips <= 3.50 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | | | |--- NumberOfTrips > 3.50 | | | | | | | | | | |--- DurationOfPitch <= 17.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- DurationOfPitch > 17.50 | | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | |--- Occupation_Large Business > 0.50 | | | | | |--- Age <= 57.50 | | | | | | |--- Age <= 30.50 | | | | | | | |--- NumberOfTrips <= 5.50 | | | | | | | | |--- MaritalStatus_Single <= 0.50 | | | | | | | | | |--- DurationOfPitch <= 6.50 | | | | | | | | | | |--- MonthlyIncome <= 18069.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- MonthlyIncome > 18069.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- DurationOfPitch > 6.50 | | | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | | |--- MaritalStatus_Single > 0.50 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- NumberOfTrips > 5.50 | | | | | | | | |--- MonthlyIncome <= 22756.00 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | |--- MonthlyIncome > 22756.00 | | | | | | | | | |--- MonthlyIncome <= 23691.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- MonthlyIncome > 23691.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- Age > 30.50 | | | | | | | |--- MonthlyIncome <= 17322.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- MonthlyIncome > 17322.50 | | | | | | | | |--- MonthlyIncome <= 32290.38 | | | | | | | | | |--- Age <= 56.50 | | | | | | | | | | |--- CityTier <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- CityTier > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- Age > 56.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- MonthlyIncome > 32290.38 | | | | | | | | | |--- DurationOfPitch <= 14.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- DurationOfPitch > 14.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | |--- Age > 57.50 | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | |--- NumberOfFollowups > 5.25 | | | | |--- CityTier <= 1.50 | | | | | |--- NumberOfTrips <= 6.50 | | | | | | |--- MaritalStatus_Single <= 0.50 | | | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | | | |--- Designation_Executive <= 0.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- Designation_Executive > 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- MaritalStatus_Single > 0.50 | | | | | | | |--- PitchSatisfactionScore <= 3.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- PitchSatisfactionScore > 3.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- NumberOfTrips > 6.50 | | | | | | |--- Gender_Male <= 0.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- Gender_Male > 0.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- CityTier > 1.50 | | | | | |--- Designation_Manager <= 0.50 | | | | | | |--- NumberOfTrips <= 4.50 | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | |--- NumberOfTrips > 4.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- Designation_Manager > 0.50 | | | | | | |--- Age <= 43.00 | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | |--- Age > 43.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | |--- PreferredPropertyStar > 4.50 | | | |--- MaritalStatus_Single <= 0.50 | | | | |--- Age <= 46.50 | | | | | |--- MaritalStatus_Unmarried <= 0.50 | | | | | | |--- CityTier <= 1.50 | | | | | | | |--- NumberOfTrips <= 4.50 | | | | | | | | |--- DurationOfPitch <= 9.50 | | | | | | | | | |--- MonthlyIncome <= 21555.00 | | | | | | | | | | |--- MonthlyIncome <= 21202.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- MonthlyIncome > 21202.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 21555.00 | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | | |--- DurationOfPitch > 9.50 | | | | | | | | | |--- weights: [82.00, 0.00] class: 0 | | | | | | | |--- NumberOfTrips > 4.50 | | | | | | | | |--- Age <= 41.50 | | | | | | | | | |--- Age <= 27.00 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- Age > 27.00 | | | | | | | | | | |--- MonthlyIncome <= 17150.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- MonthlyIncome > 17150.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- Age > 41.50 | | | | | | | | | |--- ProductPitched_Super Deluxe <= 0.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | |--- ProductPitched_Super Deluxe > 0.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- CityTier > 1.50 | | | | | | | |--- DurationOfPitch <= 15.50 | | | | | | | | |--- Age <= 43.50 | | | | | | | | | |--- Age <= 31.50 | | | | | | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- Age > 31.50 | | | | | | | | | | |--- weights: [40.00, 0.00] class: 0 | | | | | | | | |--- Age > 43.50 | | | | | | | | | |--- Age <= 45.50 | | | | | | | | | | |--- TypeofContact_Unknown <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | | |--- TypeofContact_Unknown > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- Age > 45.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- DurationOfPitch > 15.50 | | | | | | | | |--- DurationOfPitch <= 20.50 | | | | | | | | | |--- ProductPitched_Super Deluxe <= 0.50 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | |--- ProductPitched_Super Deluxe > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- DurationOfPitch > 20.50 | | | | | | | | | |--- NumberOfFollowups <= 2.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- NumberOfFollowups > 2.50 | | | | | | | | | | |--- Age <= 30.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- Age > 30.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | |--- MaritalStatus_Unmarried > 0.50 | | | | | | |--- DurationOfPitch <= 14.50 | | | | | | | |--- MonthlyIncome <= 21941.00 | | | | | | | | |--- MonthlyIncome <= 21636.00 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- MonthlyIncome > 21636.00 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- MonthlyIncome > 21941.00 | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | |--- DurationOfPitch > 14.50 | | | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | | | |--- MonthlyIncome <= 24403.50 | | | | | | | | | |--- DurationOfPitch <= 33.00 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | | | |--- DurationOfPitch > 33.00 | | | | | | | | | | |--- Designation_Executive <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- Designation_Executive > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- MonthlyIncome > 24403.50 | | | | | | | | | |--- DurationOfPitch <= 27.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- DurationOfPitch > 27.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | |--- Age > 46.50 | | | | | |--- weights: [59.00, 0.00] class: 0 | | | |--- MaritalStatus_Single > 0.50 | | | | |--- DurationOfPitch <= 13.50 | | | | | |--- Age <= 30.50 | | | | | | |--- Age <= 28.50 | | | | | | | |--- OwnCar <= 0.50 | | | | | | | | |--- NumberOfPersonVisiting <= 2.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- NumberOfPersonVisiting > 2.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- OwnCar > 0.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- Age > 28.50 | | | | | | | |--- Occupation_Salaried <= 0.50 | | | | | | | | |--- Gender_Male <= 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- Gender_Male > 0.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- Occupation_Salaried > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- Age > 30.50 | | | | | | |--- DurationOfPitch <= 11.00 | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | | |--- DurationOfPitch > 11.00 | | | | | | | |--- NumberOfTrips <= 2.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- NumberOfTrips > 2.50 | | | | | | | | |--- ProductPitched_Standard <= 0.50 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- ProductPitched_Standard > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- DurationOfPitch > 13.50 | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | |--- Age <= 46.00 | | | | | | | |--- DurationOfPitch <= 32.00 | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | | |--- DurationOfPitch > 32.00 | | | | | | | | |--- Age <= 31.00 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- Age > 31.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- Age > 46.00 | | | | | | | |--- NumberOfTrips <= 3.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- NumberOfTrips > 3.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | |--- DurationOfPitch <= 18.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- DurationOfPitch > 18.50 | | | | | | | |--- weights: [5.00, 0.00] class: 0 |--- Passport > 0.50 | |--- Designation_Executive <= 0.50 | | |--- CityTier <= 2.50 | | | |--- DurationOfPitch <= 27.50 | | | | |--- Age <= 54.50 | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | |--- Age <= 27.50 | | | | | | | |--- Gender_Male <= 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- Gender_Male > 0.50 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | |--- Age > 27.50 | | | | | | | |--- weights: [193.00, 0.00] class: 0 | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | |--- NumberOfTrips <= 2.50 | | | | | | | |--- NumberOfChildrenVisiting <= 1.50 | | | | | | | | |--- PreferredPropertyStar <= 3.50 | | | | | | | | | |--- Age <= 35.00 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- Age > 35.00 | | | | | | | | | | |--- DurationOfPitch <= 14.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- DurationOfPitch > 14.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- PreferredPropertyStar > 3.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- NumberOfChildrenVisiting > 1.50 | | | | | | | | |--- MaritalStatus_Single <= 0.50 | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | |--- MaritalStatus_Single > 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- NumberOfTrips > 2.50 | | | | | | | |--- NumberOfFollowups <= 5.25 | | | | | | | | |--- weights: [25.00, 0.00] class: 0 | | | | | | | |--- NumberOfFollowups > 5.25 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- Age > 54.50 | | | | | |--- NumberOfTrips <= 5.50 | | | | | | |--- ProductPitched_Standard <= 0.50 | | | | | | | |--- Designation_VP <= 0.50 | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | |--- Designation_VP > 0.50 | | | | | | | | |--- PitchSatisfactionScore <= 2.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | |--- PitchSatisfactionScore > 2.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- ProductPitched_Standard > 0.50 | | | | | | | |--- MonthlyIncome <= 29210.00 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- MonthlyIncome > 29210.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- NumberOfTrips > 5.50 | | | | | | |--- MaritalStatus_Unmarried <= 0.50 | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | |--- MaritalStatus_Unmarried > 0.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- DurationOfPitch > 27.50 | | | | |--- Age <= 49.00 | | | | | |--- Age <= 46.50 | | | | | | |--- NumberOfTrips <= 1.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- NumberOfTrips > 1.50 | | | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | | | |--- MonthlyIncome <= 28409.50 | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | |--- MonthlyIncome > 28409.50 | | | | | | | | | |--- MonthlyIncome <= 28847.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 28847.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | | | |--- Gender_Male <= 0.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- Gender_Male > 0.50 | | | | | | | | | |--- Occupation_Large Business <= 0.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- Occupation_Large Business > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- Age > 46.50 | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | |--- Age > 49.00 | | | | | |--- weights: [17.00, 0.00] class: 0 | | |--- CityTier > 2.50 | | | |--- DurationOfPitch <= 17.50 | | | | |--- MaritalStatus_Married <= 0.50 | | | | | |--- MonthlyIncome <= 21828.50 | | | | | | |--- Occupation_Salaried <= 0.50 | | | | | | | |--- Age <= 26.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- Age > 26.50 | | | | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | | | | |--- Age <= 42.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- Age > 42.50 | | | | | | | | | | |--- Occupation_Small Business <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- Occupation_Small Business > 0.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- Occupation_Salaried > 0.50 | | | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | | | |--- MonthlyIncome <= 19523.00 | | | | | | | | | |--- PitchSatisfactionScore <= 2.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- PitchSatisfactionScore > 2.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- MonthlyIncome > 19523.00 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- MonthlyIncome > 21828.50 | | | | | | |--- MaritalStatus_Unmarried <= 0.50 | | | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | | | |--- weights: [19.00, 0.00] class: 0 | | | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | | | |--- PreferredPropertyStar <= 3.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- PreferredPropertyStar > 3.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | | | |--- Age <= 43.00 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- Age > 43.00 | | | | | | | | | |--- ProductPitched_Super Deluxe <= 0.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- ProductPitched_Super Deluxe > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- MaritalStatus_Unmarried > 0.50 | | | | | | | |--- DurationOfPitch <= 8.50 | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- DurationOfPitch > 8.50 | | | | | | | | |--- PreferredPropertyStar <= 3.50 | | | | | | | | | |--- Occupation_Small Business <= 0.50 | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | | |--- Occupation_Small Business > 0.50 | | | | | | | | | | |--- Age <= 37.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- Age > 37.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- PreferredPropertyStar > 3.50 | | | | | | | | | |--- DurationOfPitch <= 11.00 | | | | | | | | | | |--- Occupation_Small Business <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- Occupation_Small Business > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- DurationOfPitch > 11.00 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | |--- MaritalStatus_Married > 0.50 | | | | | |--- NumberOfFollowups <= 5.25 | | | | | | |--- MonthlyIncome <= 21815.00 | | | | | | | |--- MonthlyIncome <= 21745.00 | | | | | | | | |--- DurationOfPitch <= 9.50 | | | | | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- DurationOfPitch > 9.50 | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | |--- MonthlyIncome > 21745.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- MonthlyIncome > 21815.00 | | | | | | | |--- weights: [64.00, 0.00] class: 0 | | | | | |--- NumberOfFollowups > 5.25 | | | | | | |--- Age <= 35.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- Age > 35.50 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | |--- DurationOfPitch > 17.50 | | | | |--- MaritalStatus_Single <= 0.50 | | | | | |--- MonthlyIncome <= 28257.00 | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | |--- DurationOfPitch <= 29.50 | | | | | | | | |--- Age <= 34.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- Age > 34.50 | | | | | | | | | |--- TypeofContact_Self Enquiry <= 0.50 | | | | | | | | | | |--- ProductPitched_Standard <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- ProductPitched_Standard > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- TypeofContact_Self Enquiry > 0.50 | | | | | | | | | | |--- MonthlyIncome <= 21731.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- MonthlyIncome > 21731.00 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | |--- DurationOfPitch > 29.50 | | | | | | | | |--- NumberOfChildrenVisiting <= 2.50 | | | | | | | | | |--- Occupation_Large Business <= 0.50 | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | |--- Occupation_Large Business > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- NumberOfChildrenVisiting > 2.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | |--- MonthlyIncome > 28257.00 | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- MaritalStatus_Single > 0.50 | | | | | |--- MonthlyIncome <= 24568.00 | | | | | | |--- MonthlyIncome <= 20372.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- MonthlyIncome > 20372.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- MonthlyIncome > 24568.00 | | | | | | |--- weights: [0.00, 13.00] class: 1 | |--- Designation_Executive > 0.50 | | |--- NumberOfFollowups <= 2.50 | | | |--- TypeofContact_Self Enquiry <= 0.50 | | | | |--- NumberOfTrips <= 3.00 | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- NumberOfTrips > 3.00 | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- TypeofContact_Self Enquiry > 0.50 | | | | |--- MonthlyIncome <= 16907.00 | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- MonthlyIncome > 16907.00 | | | | | |--- NumberOfPersonVisiting <= 3.50 | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | |--- NumberOfPersonVisiting > 3.50 | | | | | | |--- MonthlyIncome <= 20887.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- MonthlyIncome > 20887.50 | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | |--- NumberOfFollowups > 2.50 | | | |--- DurationOfPitch <= 22.50 | | | | |--- Age <= 25.50 | | | | | |--- Occupation_Large Business <= 0.50 | | | | | | |--- PitchSatisfactionScore <= 1.50 | | | | | | | |--- Occupation_Salaried <= 0.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- Occupation_Salaried > 0.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- PitchSatisfactionScore > 1.50 | | | | | | | |--- MaritalStatus_Married <= 0.50 | | | | | | | | |--- weights: [0.00, 30.00] class: 1 | | | | | | | |--- MaritalStatus_Married > 0.50 | | | | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- Occupation_Large Business > 0.50 | | | | | | |--- CityTier <= 2.00 | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- CityTier > 2.00 | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | |--- Age > 25.50 | | | | | |--- NumberOfPersonVisiting <= 2.50 | | | | | | |--- NumberOfFollowups <= 3.50 | | | | | | | |--- MonthlyIncome <= 19708.00 | | | | | | | | |--- MonthlyIncome <= 18267.00 | | | | | | | | | |--- Age <= 47.50 | | | | | | | | | | |--- Age <= 26.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- Age > 26.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- Age > 47.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- MonthlyIncome > 18267.00 | | | | | | | | | |--- NumberOfTrips <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- NumberOfTrips > 1.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- MonthlyIncome > 19708.00 | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | |--- NumberOfFollowups > 3.50 | | | | | | | |--- NumberOfTrips <= 2.50 | | | | | | | | |--- Gender_Male <= 0.50 | | | | | | | | | |--- MonthlyIncome <= 17330.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 17330.50 | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | |--- Gender_Male > 0.50 | | | | | | | | | |--- MonthlyIncome <= 17693.00 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 17693.00 | | | | | | | | | | |--- MonthlyIncome <= 20466.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- MonthlyIncome > 20466.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- NumberOfTrips > 2.50 | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | |--- NumberOfPersonVisiting > 2.50 | | | | | | |--- MaritalStatus_Single <= 0.50 | | | | | | | |--- MonthlyIncome <= 23624.50 | | | | | | | | |--- PreferredPropertyStar <= 4.50 | | | | | | | | | |--- Age <= 44.00 | | | | | | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- Age > 44.00 | | | | | | | | | | |--- NumberOfTrips <= 4.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- NumberOfTrips > 4.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- PreferredPropertyStar > 4.50 | | | | | | | | | |--- Age <= 41.50 | | | | | | | | | | |--- DurationOfPitch <= 20.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- DurationOfPitch > 20.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- Age > 41.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | |--- MonthlyIncome > 23624.50 | | | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | | | |--- Occupation_Salaried <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- Occupation_Salaried > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- MaritalStatus_Single > 0.50 | | | | | | | |--- DurationOfPitch <= 12.50 | | | | | | | | |--- Occupation_Small Business <= 0.50 | | | | | | | | | |--- DurationOfPitch <= 11.50 | | | | | | | | | | |--- PreferredPropertyStar <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- PreferredPropertyStar > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- DurationOfPitch > 11.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- Occupation_Small Business > 0.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- DurationOfPitch > 12.50 | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | |--- DurationOfPitch > 22.50 | | | | |--- DurationOfPitch <= 32.50 | | | | | |--- NumberOfPersonVisiting <= 1.50 | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- NumberOfPersonVisiting > 1.50 | | | | | | |--- MonthlyIncome <= 22644.50 | | | | | | | |--- DurationOfPitch <= 23.50 | | | | | | | | |--- Age <= 24.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- Age > 24.50 | | | | | | | | | |--- NumberOfTrips <= 1.50 | | | | | | | | | | |--- CityTier <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- CityTier > 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- NumberOfTrips > 1.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- DurationOfPitch > 23.50 | | | | | | | | |--- weights: [0.00, 44.00] class: 1 | | | | | | |--- MonthlyIncome > 22644.50 | | | | | | | |--- NumberOfTrips <= 3.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- NumberOfTrips > 3.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- DurationOfPitch > 32.50 | | | | | |--- weights: [2.00, 0.00] class: 0
importances = decision_tree.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
default_hyperparameters = pd.concat(
[
bagging_metrics_train.T,
bagging_metrics_test.T,
rf_metrics_train.T,
rf_metrics_test.T,
tree_metrics_train.T,
tree_metrics_test.T,
],
axis=1,
)
default_hyperparameters.columns = [
"bagging_metrics_train",
"bagging_metrics_test",
"random_forest_metrics_train",
"random_forest_metrics_test",
"tree_metrics_train",
"tree_metrics_test",
]
default_hyperparameters
| bagging_metrics_train | bagging_metrics_test | random_forest_metrics_train | random_forest_metrics_test | tree_metrics_train | tree_metrics_test | |
|---|---|---|---|---|---|---|
| Accuracy | 0.994281 | 0.903860 | 1.0 | 0.910877 | 1.0 | 0.893333 |
| Recall | 0.971246 | 0.630597 | 1.0 | 0.597015 | 1.0 | 0.742537 |
| Precision | 0.998358 | 0.816425 | 1.0 | 0.893855 | 1.0 | 0.705674 |
| F1 | 0.984615 | 0.711579 | 1.0 | 0.715884 | 1.0 | 0.723636 |
Bagging Classifier
Random Forest Classifier
Decision Tree Classifier
# Build the model:
bagging_tuned = BaggingClassifier(random_state=1)
bagging_tuned
BaggingClassifier(random_state=1)
# Parameters for tuning:
parameters = {
"max_samples": [0.7, 0.8, 0.9, 1],
"max_features": [0.7, 0.8, 0.9, 1],
"n_estimators": [10, 20, 30, 40, 50],
}
# Type of scoring used to compare parameter combinations - We will choose Recall
recall_score = metrics.make_scorer(metrics.recall_score)
# Run the grid search
bagging_grid_search = GridSearchCV(
bagging_tuned, parameters, scoring=recall_score, cv=5
)
bagging_grid_search.fit(x_train, y_train)
GridSearchCV(cv=5, estimator=BaggingClassifier(random_state=1),
param_grid={'max_features': [0.7, 0.8, 0.9, 1],
'max_samples': [0.7, 0.8, 0.9, 1],
'n_estimators': [10, 20, 30, 40, 50]},
scoring=make_scorer(recall_score))
# Set the grid search object to the best combination of parameters
bagging_tuned = bagging_grid_search.best_estimator_
bagging_tuned
BaggingClassifier(max_features=0.9, max_samples=0.9, n_estimators=50,
random_state=1)
# Calculate the model metrics for training dataset
bagging_tuned_metrics_train = model_metrics(bagging_tuned, x_train, y_train)
bagging_tuned_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 1.0 | 1.0 | 1.0 | 1.0 |
# Confusion matrix for training dataset
confusion_matrix(bagging_tuned, x_train, y_train)
# Calculate the model metrics for testing dataset
bagging_tuned_metrics_test = model_metrics(bagging_tuned, x_test, y_test)
bagging_tuned_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.915088 | 0.63806 | 0.876923 | 0.738661 |
# Confusion matrix for testing dataset
confusion_matrix(bagging_tuned, x_test, y_test)
bagging_weighted = BaggingClassifier(
base_estimator=DecisionTreeClassifier(
criterion="gini", class_weight={0: 0.3, 1: 0.7}, random_state=1
),
random_state=1,
)
bagging_weighted.fit(x_train, y_train)
BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight={0: 0.3,
1: 0.7},
random_state=1),
random_state=1)
# Calculate the model metrics for training dataset
bagging_weighted_metrics_train = model_metrics(bagging_weighted, x_train, y_train)
bagging_weighted_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.994582 | 0.972843 | 0.998361 | 0.985437 |
# Confusion matrix for training dataset
confusion_matrix(bagging_weighted, x_train, y_train)
# Calculate the model metrics for testing dataset
bagging_weighted_metrics_test = model_metrics(bagging_weighted, x_test, y_test)
bagging_weighted_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.911579 | 0.615672 | 0.87766 | 0.723684 |
# Confusion matrix for testing dataset
confusion_matrix(bagging_weighted, x_test, y_test)
# Build the model
random_forest_tuned = RandomForestClassifier(random_state=1)
# Parameters for tuning:
parameters = {
'n_estimators': [150,200,250],
'min_samples_leaf': np.arange(5,10),
'max_features': np.arange(0.2,0.7,0.1),
'max_samples': np.arange(0.3,0.7,0.1)
}
# Type of scoring used to compare parameter combinations - We will choose Recall
recall_score = metrics.make_scorer(metrics.recall_score)
# Run the grid search
random_forest_grid_search = GridSearchCV(
random_forest_tuned, parameters, scoring=recall_score, cv=5
)
random_forest_grid_search.fit(x_train, y_train)
GridSearchCV(cv=5, estimator=RandomForestClassifier(random_state=1),
param_grid={'max_features': array([0.2, 0.3, 0.4, 0.5, 0.6]),
'max_samples': array([0.3, 0.4, 0.5, 0.6]),
'min_samples_leaf': array([5, 6, 7, 8, 9]),
'n_estimators': [150, 200, 250]},
scoring=make_scorer(recall_score))
# Set the grid search object to the best combination of parameters
random_forest_tuned = random_forest_grid_search.best_estimator_
random_forest_tuned
RandomForestClassifier(max_features=0.6000000000000001,
max_samples=0.6000000000000001, min_samples_leaf=5,
n_estimators=150, random_state=1)
# Calculate the model metrics for training dataset
rf_tuned_metrics_train = model_metrics(random_forest_tuned, x_train, y_train)
rf_tuned_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.917219 | 0.595847 | 0.944304 | 0.730656 |
# Confusion matrix for training dataset
confusion_matrix(random_forest_tuned, x_train, y_train)
# Calculate the model metrics for testing dataset
rf_tuned_metrics_test = model_metrics(random_forest_tuned, x_test, y_test)
rf_tuned_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.861754 | 0.38806 | 0.759124 | 0.51358 |
# Confusion matrix for testing dataset
confusion_matrix(random_forest_tuned, x_test, y_test)
# Build the model
rf_weighted = weighted_random_forest = RandomForestClassifier(random_state=1)
# Parameters for tuning:
parameters = {
"class_weight": [{0: 0.3, 1: 0.7}],
"n_estimators": [100, 150, 200, 250],
"min_samples_leaf": np.arange(5, 10),
"max_features": np.arange(0.2, 0.7, 0.1),
"max_samples": np.arange(0.3, 0.7, 0.1),
}
# Type of scoring used to compare parameter combinations - We will choose Recall
recall_score = metrics.make_scorer(metrics.recall_score)
# Run the grid search
random_forest_weighted_grid_search = GridSearchCV(
rf_weighted, parameters, scoring=recall_score, cv=5
)
random_forest_weighted_grid_search.fit(x_train, y_train)
GridSearchCV(cv=5, estimator=RandomForestClassifier(random_state=1),
param_grid={'class_weight': [{0: 0.3, 1: 0.7}],
'max_features': array([0.2, 0.3, 0.4, 0.5, 0.6]),
'max_samples': array([0.3, 0.4, 0.5, 0.6]),
'min_samples_leaf': array([5, 6, 7, 8, 9]),
'n_estimators': [100, 150, 200, 250]},
scoring=make_scorer(recall_score))
# Set the grid search object to the best combination of parameters
rf_weighted = random_forest_weighted_grid_search.best_estimator_
rf_weighted
RandomForestClassifier(class_weight={0: 0.3, 1: 0.7},
max_features=0.6000000000000001,
max_samples=0.6000000000000001, min_samples_leaf=9,
random_state=1)
# Calculate the model metrics for training dataset
rf_weighted_metrics_train = model_metrics(rf_weighted, x_train, y_train)
rf_weighted_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.89404 | 0.691693 | 0.731419 | 0.711002 |
# Confusion matrix for training dataset
confusion_matrix(rf_weighted, x_train, y_train)
# Calculate the model metrics for testing dataset
rf_weighted_metrics_test = model_metrics(rf_weighted, x_test, y_test)
rf_weighted_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.842105 | 0.514925 | 0.592275 | 0.550898 |
# Confusion matrix for testing dataset
confusion_matrix(rf_weighted, x_test, y_test)
# Build the model
decision_tree_tuned = DecisionTreeClassifier(random_state=1)
# Parameters for tuning:
parameters = {
"class_weight": [{0: 0.3, 1: 0.7}],
"max_depth": np.arange(2, 10),
"min_samples_leaf": [5, 7, 10, 15],
"max_leaf_nodes": [2, 3, 5, 10, 15],
"min_impurity_decrease": [0.0001, 0.001, 0.01, 0.1],
}
# Type of scoring used to compare parameter combinations - We will choose Recall
recall_score = metrics.make_scorer(metrics.recall_score)
# Run the grid search
tree_tuned_grid_search = GridSearchCV(
decision_tree_tuned, parameters, scoring=recall_score, n_jobs=-1
)
tree_tuned_grid_search = tree_tuned_grid_search.fit(x_train, y_train)
# Set the grid search object to the best combination of parameters
decision_tree_tuned = tree_tuned_grid_search.best_estimator_
decision_tree_tuned
DecisionTreeClassifier(class_weight={0: 0.3, 1: 0.7}, max_depth=6,
max_leaf_nodes=15, min_impurity_decrease=0.0001,
min_samples_leaf=15, random_state=1)
# Calculate the model metrics for training dataset
tree_tuned_metrics_train = model_metrics(decision_tree_tuned, x_train, y_train)
tree_tuned_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.829922 | 0.597444 | 0.544396 | 0.569688 |
# Confusion matrix for training dataset
confusion_matrix(decision_tree_tuned, x_train, y_train)
# Calculate the model metrics for testing dataset
tree_tuned_metrics_test = model_metrics(decision_tree_tuned, x_test, y_test)
tree_tuned_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.804211 | 0.526119 | 0.481229 | 0.502674 |
# Confusion matrix for testing dataset
confusion_matrix(decision_tree_tuned, x_test, y_test)
plt.figure(figsize=(15, 10))
tree.plot_tree(
decision_tree_tuned,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
plt.show()
# Text report showing the rules of a decision tree -
print(
tree.export_text(
decision_tree_tuned, feature_names=feature_names, show_weights=True
)
)
|--- Passport <= 0.50 | |--- Age <= 22.50 | | |--- PitchSatisfactionScore <= 2.50 | | | |--- weights: [5.40, 0.00] class: 0 | | |--- PitchSatisfactionScore > 2.50 | | | |--- weights: [8.40, 25.20] class: 1 | |--- Age > 22.50 | | |--- PreferredPropertyStar <= 4.50 | | | |--- Occupation_Large Business <= 0.50 | | | | |--- NumberOfFollowups <= 5.25 | | | | | |--- weights: [453.90, 83.30] class: 0 | | | | |--- NumberOfFollowups > 5.25 | | | | | |--- weights: [10.50, 11.90] class: 1 | | | |--- Occupation_Large Business > 0.50 | | | | |--- weights: [37.20, 24.50] class: 0 | | |--- PreferredPropertyStar > 4.50 | | | |--- MaritalStatus_Single <= 0.50 | | | | |--- Age <= 46.50 | | | | | |--- CityTier <= 1.50 | | | | | | |--- weights: [49.50, 15.40] class: 0 | | | | | |--- CityTier > 1.50 | | | | | | |--- weights: [26.40, 25.20] class: 0 | | | | |--- Age > 46.50 | | | | | |--- weights: [17.70, 0.00] class: 0 | | | |--- MaritalStatus_Single > 0.50 | | | | |--- weights: [11.40, 18.90] class: 1 |--- Passport > 0.50 | |--- Designation_Executive <= 0.50 | | |--- CityTier <= 2.50 | | | |--- weights: [90.90, 22.40] class: 0 | | |--- CityTier > 2.50 | | | |--- MaritalStatus_Married <= 0.50 | | | | |--- weights: [19.20, 45.50] class: 1 | | | |--- MaritalStatus_Married > 0.50 | | | | |--- DurationOfPitch <= 19.50 | | | | | |--- weights: [26.10, 2.10] class: 0 | | | | |--- DurationOfPitch > 19.50 | | | | | |--- weights: [3.00, 8.40] class: 1 | |--- Designation_Executive > 0.50 | | |--- NumberOfFollowups <= 2.50 | | | |--- weights: [7.80, 3.50] class: 0 | | |--- NumberOfFollowups > 2.50 | | | |--- weights: [41.40, 151.90] class: 1
importances = decision_tree_tuned.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
tree_cost_complexity = DecisionTreeClassifier(random_state=1)
path = tree_cost_complexity.cost_complexity_pruning_path(x_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000 | 0.000000 |
| 1 | 0.000194 | 0.000583 |
| 2 | 0.000200 | 0.001183 |
| 3 | 0.000234 | 0.002353 |
| 4 | 0.000239 | 0.003547 |
| 5 | 0.000241 | 0.004029 |
| 6 | 0.000251 | 0.005032 |
| 7 | 0.000258 | 0.005549 |
| 8 | 0.000258 | 0.006065 |
| 9 | 0.000263 | 0.007645 |
| 10 | 0.000268 | 0.008180 |
| 11 | 0.000268 | 0.008715 |
| 12 | 0.000269 | 0.009793 |
| 13 | 0.000271 | 0.010334 |
| 14 | 0.000271 | 0.010876 |
| 15 | 0.000271 | 0.011418 |
| 16 | 0.000276 | 0.011970 |
| 17 | 0.000279 | 0.013085 |
| 18 | 0.000280 | 0.013644 |
| 19 | 0.000280 | 0.014203 |
| 20 | 0.000282 | 0.014767 |
| 21 | 0.000283 | 0.015334 |
| 22 | 0.000283 | 0.015901 |
| 23 | 0.000284 | 0.016469 |
| 24 | 0.000284 | 0.017038 |
| 25 | 0.000287 | 0.017611 |
| 26 | 0.000288 | 0.018187 |
| 27 | 0.000288 | 0.018763 |
| 28 | 0.000288 | 0.019339 |
| 29 | 0.000288 | 0.019915 |
| 30 | 0.000288 | 0.020491 |
| 31 | 0.000293 | 0.021076 |
| 32 | 0.000293 | 0.022248 |
| 33 | 0.000293 | 0.022834 |
| 34 | 0.000294 | 0.023422 |
| 35 | 0.000294 | 0.024598 |
| 36 | 0.000297 | 0.025788 |
| 37 | 0.000298 | 0.026384 |
| 38 | 0.000298 | 0.026981 |
| 39 | 0.000299 | 0.027578 |
| 40 | 0.000300 | 0.028177 |
| 41 | 0.000303 | 0.028782 |
| 42 | 0.000348 | 0.029826 |
| 43 | 0.000359 | 0.030903 |
| 44 | 0.000361 | 0.031626 |
| 45 | 0.000364 | 0.032354 |
| 46 | 0.000378 | 0.033488 |
| 47 | 0.000380 | 0.034629 |
| 48 | 0.000391 | 0.035801 |
| 49 | 0.000401 | 0.036203 |
| 50 | 0.000401 | 0.036604 |
| 51 | 0.000401 | 0.037006 |
| 52 | 0.000401 | 0.037808 |
| 53 | 0.000401 | 0.038210 |
| 54 | 0.000401 | 0.038611 |
| 55 | 0.000401 | 0.039012 |
| 56 | 0.000401 | 0.039414 |
| 57 | 0.000401 | 0.039815 |
| 58 | 0.000401 | 0.040216 |
| 59 | 0.000401 | 0.040618 |
| 60 | 0.000406 | 0.041024 |
| 61 | 0.000420 | 0.042704 |
| 62 | 0.000428 | 0.043132 |
| 63 | 0.000428 | 0.043561 |
| 64 | 0.000430 | 0.044421 |
| 65 | 0.000440 | 0.046182 |
| 66 | 0.000442 | 0.046624 |
| 67 | 0.000452 | 0.047076 |
| 68 | 0.000452 | 0.047527 |
| 69 | 0.000452 | 0.047979 |
| 70 | 0.000452 | 0.048430 |
| 71 | 0.000452 | 0.048882 |
| 72 | 0.000452 | 0.049333 |
| 73 | 0.000452 | 0.049785 |
| 74 | 0.000452 | 0.050236 |
| 75 | 0.000459 | 0.050695 |
| 76 | 0.000464 | 0.051160 |
| 77 | 0.000467 | 0.051627 |
| 78 | 0.000471 | 0.052098 |
| 79 | 0.000482 | 0.052579 |
| 80 | 0.000482 | 0.053061 |
| 81 | 0.000482 | 0.053542 |
| 82 | 0.000482 | 0.054506 |
| 83 | 0.000482 | 0.054987 |
| 84 | 0.000502 | 0.055991 |
| 85 | 0.000502 | 0.056492 |
| 86 | 0.000502 | 0.056994 |
| 87 | 0.000502 | 0.057496 |
| 88 | 0.000502 | 0.057998 |
| 89 | 0.000502 | 0.059001 |
| 90 | 0.000505 | 0.061022 |
| 91 | 0.000510 | 0.062553 |
| 92 | 0.000516 | 0.063070 |
| 93 | 0.000516 | 0.063586 |
| 94 | 0.000518 | 0.064622 |
| 95 | 0.000527 | 0.065149 |
| 96 | 0.000527 | 0.065676 |
| 97 | 0.000527 | 0.066203 |
| 98 | 0.000530 | 0.067263 |
| 99 | 0.000531 | 0.069386 |
| 100 | 0.000535 | 0.069922 |
| 101 | 0.000542 | 0.070463 |
| 102 | 0.000542 | 0.071005 |
| 103 | 0.000542 | 0.071547 |
| 104 | 0.000542 | 0.072089 |
| 105 | 0.000543 | 0.073174 |
| 106 | 0.000545 | 0.074264 |
| 107 | 0.000552 | 0.074816 |
| 108 | 0.000552 | 0.075367 |
| 109 | 0.000556 | 0.078148 |
| 110 | 0.000557 | 0.078705 |
| 111 | 0.000559 | 0.079824 |
| 112 | 0.000561 | 0.080945 |
| 113 | 0.000570 | 0.081516 |
| 114 | 0.000571 | 0.082086 |
| 115 | 0.000572 | 0.083802 |
| 116 | 0.000578 | 0.084380 |
| 117 | 0.000579 | 0.084959 |
| 118 | 0.000581 | 0.086120 |
| 119 | 0.000582 | 0.089028 |
| 120 | 0.000592 | 0.089620 |
| 121 | 0.000593 | 0.090805 |
| 122 | 0.000602 | 0.091407 |
| 123 | 0.000607 | 0.093227 |
| 124 | 0.000612 | 0.094451 |
| 125 | 0.000615 | 0.098138 |
| 126 | 0.000617 | 0.098756 |
| 127 | 0.000631 | 0.099386 |
| 128 | 0.000642 | 0.100029 |
| 129 | 0.000664 | 0.103347 |
| 130 | 0.000684 | 0.104714 |
| 131 | 0.000688 | 0.105402 |
| 132 | 0.000702 | 0.108910 |
| 133 | 0.000722 | 0.109633 |
| 134 | 0.000731 | 0.110363 |
| 135 | 0.000740 | 0.111844 |
| 136 | 0.000745 | 0.114078 |
| 137 | 0.000753 | 0.117842 |
| 138 | 0.000781 | 0.118623 |
| 139 | 0.000793 | 0.120208 |
| 140 | 0.000801 | 0.122612 |
| 141 | 0.000803 | 0.123415 |
| 142 | 0.000803 | 0.125020 |
| 143 | 0.000806 | 0.126632 |
| 144 | 0.000809 | 0.128251 |
| 145 | 0.000821 | 0.130713 |
| 146 | 0.000836 | 0.131549 |
| 147 | 0.000861 | 0.133272 |
| 148 | 0.000878 | 0.135907 |
| 149 | 0.000921 | 0.136828 |
| 150 | 0.000937 | 0.139640 |
| 151 | 0.000946 | 0.148157 |
| 152 | 0.000962 | 0.151043 |
| 153 | 0.000963 | 0.152007 |
| 154 | 0.000988 | 0.152994 |
| 155 | 0.000999 | 0.153993 |
| 156 | 0.001007 | 0.158023 |
| 157 | 0.001015 | 0.159038 |
| 158 | 0.001022 | 0.161082 |
| 159 | 0.001034 | 0.163149 |
| 160 | 0.001059 | 0.164209 |
| 161 | 0.001065 | 0.166339 |
| 162 | 0.001079 | 0.167418 |
| 163 | 0.001093 | 0.168511 |
| 164 | 0.001104 | 0.172925 |
| 165 | 0.001121 | 0.177408 |
| 166 | 0.001207 | 0.178615 |
| 167 | 0.001263 | 0.186195 |
| 168 | 0.001271 | 0.190008 |
| 169 | 0.001391 | 0.194179 |
| 170 | 0.001428 | 0.202747 |
| 171 | 0.001455 | 0.204202 |
| 172 | 0.001462 | 0.205664 |
| 173 | 0.001496 | 0.211646 |
| 174 | 0.001578 | 0.214802 |
| 175 | 0.001602 | 0.218005 |
| 176 | 0.001643 | 0.221291 |
| 177 | 0.001849 | 0.226838 |
| 178 | 0.002242 | 0.233565 |
| 179 | 0.002676 | 0.236241 |
| 180 | 0.003266 | 0.239508 |
| 181 | 0.003312 | 0.246132 |
| 182 | 0.003475 | 0.249608 |
| 183 | 0.005084 | 0.254692 |
| 184 | 0.005229 | 0.265150 |
| 185 | 0.020165 | 0.285315 |
| 186 | 0.020546 | 0.305862 |
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
tree_cost_complexities = []
for ccp_alpha in ccp_alphas:
tree_cost_complexity = DecisionTreeClassifier(random_state=1, ccp_alpha=ccp_alpha)
tree_cost_complexity.fit(x_train, y_train)
tree_cost_complexities.append(tree_cost_complexity)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
tree_cost_complexities[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.020546366248934245
tree_cost_complexities
[DecisionTreeClassifier(random_state=1), DecisionTreeClassifier(ccp_alpha=0.00019441099739112984, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00019993628895866792, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002340351561506303, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00023884485940723127, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00024081878386514143, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002508528998595224, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002580201255697945, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002580201255697945, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002633955448524985, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00026757642651682387, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00026757642651682387, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002693367977439082, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002709211318482842, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002709211318482842, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002709211318482842, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002759381898454746, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002787254442883582, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002795218027006107, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002795218027006107, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002822095123419627, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00028331621631193124, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00028331621631193124, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002842999531741254, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002842999531741254, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002866890284108828, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002879355024474518, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002879355024474518, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002879355024474518, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002879355024474518, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002879355024474518, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002926617165027762, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00029297041723561567, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00029330492906651825, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002936814437379775, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00029402293378883576, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002974398669762909, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002980746413622626, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002983829229907994, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002988262281538243, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0002995107990282534, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00030257751894528827, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0003478092380660963, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00035911573032521097, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0003612281757977122, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0003642521559604025, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00037793424077554194, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00038039915906108317, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0003907819774724419, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004013646397752358, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00040638169777242636, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004200327625554792, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00042812228242691797, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00042812228242691797, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004300335426163241, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004404318946713913, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004417192367091583, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004515352197471403, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004515352197471403, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004515352197471403, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004515352197471403, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004515352197471403, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004515352197471403, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004515352197471403, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004515352197471403, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00045870244545741234, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00046443622602563, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00046697232127695704, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0004710460008473254, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00048163756773028285, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00048163756773028285, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00048163756773028285, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00048163756773028285, random_state=1), DecisionTreeClassifier(ccp_alpha=0.00048163756773028285, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005017057997190448, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005017057997190448, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005017057997190448, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005017057997190448, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005017057997190448, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005017057997190448, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005052894125741809, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005104311179750281, random_state=1), DecisionTreeClassifier(ccp_alpha=0.000516040251139589, random_state=1), DecisionTreeClassifier(ccp_alpha=0.000516040251139589, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005183773899581559, random_state=1), DecisionTreeClassifier(ccp_alpha=0.000526791089704997, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005270018904611815, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005272471858865594, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005298449510945913, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005308371042188603, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005351528530336476, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005418422636965684, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005418422636965684, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005418422636965684, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005418422636965684, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005426784400294336, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005447091539806772, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005518763796909492, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005518763796909492, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005561535913881854, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005572793652263853, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005590436054012214, random_state=1), DecisionTreeClassifier(ccp_alpha=0.000560827267254119, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005703602775753353, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005708297099025581, random_state=1), DecisionTreeClassifier(ccp_alpha=0.000571944611679711, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005779650812763396, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005788913073681289, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005805452825320376, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005815127452652447, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005919388821255176, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0005926399759181217, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0006020469596628537, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0006065689758842302, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0006120810756572344, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0006145435613477743, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0006174840611926705, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0006307158625039422, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0006421834236403775, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0006636051198125618, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0006837533327599551, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0006880536681861184, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0007016211647578815, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0007224563515954246, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0007306576104886642, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0007402295464152584, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0007446370290566878, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0007529099700015055, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0007805108798486279, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0007928048659904385, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0008011702366220649, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0008027292795504716, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0008027292795504718, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0008062042980766641, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0008094749037483747, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0008205677079849263, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0008361763328650746, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0008613075666055799, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0008782240570320037, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0009214693649970268, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0009374308304229365, random_state=1), DecisionTreeClassifier(ccp_alpha=0.000946303702970302, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0009620464681959481, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0009632751354605657, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0009879723657119262, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0009989946679955868, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0010074083026756263, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0010148975380763463, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0010221494444322648, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0010335893919023928, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0010592031425364747, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0010650245923860425, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0010792249202845224, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0010927911200603057, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0011036428650746414, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0011206283180997206, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0012068304964150851, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0012633923478899654, random_state=1), DecisionTreeClassifier(ccp_alpha=0.001270825129590943, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0013906582082405066, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0014279376221254826, random_state=1), DecisionTreeClassifier(ccp_alpha=0.001455263686006105, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0014618263970409089, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0014955328337809895, random_state=1), DecisionTreeClassifier(ccp_alpha=0.001577685559473639, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0016015596970504684, random_state=1), DecisionTreeClassifier(ccp_alpha=0.001643279457848994, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0018488766011916745, random_state=1), DecisionTreeClassifier(ccp_alpha=0.002242379204365763, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0026761721560623185, random_state=1), DecisionTreeClassifier(ccp_alpha=0.003266216606559855, random_state=1), DecisionTreeClassifier(ccp_alpha=0.003312300417693547, random_state=1), DecisionTreeClassifier(ccp_alpha=0.0034754759534945737, random_state=1), DecisionTreeClassifier(ccp_alpha=0.005084383014713506, random_state=1), DecisionTreeClassifier(ccp_alpha=0.005229154954473982, random_state=1), DecisionTreeClassifier(ccp_alpha=0.02016493455260912, random_state=1), DecisionTreeClassifier(ccp_alpha=0.020546366248934245, random_state=1)]
tree_cost_complexities = tree_cost_complexities[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in tree_cost_complexities]
depth = [clf.tree_.max_depth for clf in tree_cost_complexities]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
recall_train = []
for clf in tree_cost_complexities:
pred_train3 = clf.predict(x_train)
values_train = metrics.recall_score(y_train, pred_train3)
recall_train.append(values_train)
recall_test = []
for clf in tree_cost_complexities:
pred_test = clf.predict(x_test)
values_test = metrics.recall_score(y_test, pred_test)
recall_test.append(values_test)
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post")
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
tree_cost_complex_model = tree_cost_complexities[index_best_model]
print(tree_cost_complex_model)
DecisionTreeClassifier(random_state=1)
# Calculate the model metrics for training dataset
tree_cost_complex_metrics_train = model_metrics(
tree_cost_complex_model, x_train, y_train
)
tree_cost_complex_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 1.0 | 1.0 | 1.0 | 1.0 |
# Confusion matrix for training dataset
confusion_matrix(tree_cost_complex_model, x_train, y_train)
# Calculate the model metrics for testing dataset
tree_cost_complex_metrics_test = model_metrics(tree_cost_complex_model, x_test, y_test)
tree_cost_complex_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.893333 | 0.742537 | 0.705674 | 0.723636 |
# Confusion matrix for testing dataset
confusion_matrix(tree_cost_complex_model, x_test, y_test)
plt.figure(figsize=(10, 10))
out = tree.plot_tree(
tree_cost_complex_model,
feature_names=feature_names,
filled=True,
fontsize=9,
node_ids=True,
class_names=True,
)
for o in out:
arrow = o.arrow_patch
if arrow is not None:
arrow.set_edgecolor("black")
arrow.set_linewidth(1)
plt.show()
plt.show()
# Text report showing the rules of a decision tree -
print(
tree.export_text(
tree_cost_complex_model, feature_names=feature_names, show_weights=True
)
)
|--- Passport <= 0.50 | |--- Age <= 22.50 | | |--- PitchSatisfactionScore <= 2.50 | | | |--- weights: [18.00, 0.00] class: 0 | | |--- PitchSatisfactionScore > 2.50 | | | |--- Occupation_Large Business <= 0.50 | | | | |--- MonthlyIncome <= 21427.50 | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | |--- Gender_Male <= 0.50 | | | | | | | |--- Occupation_Salaried <= 0.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- Occupation_Salaried > 0.50 | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- Gender_Male > 0.50 | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | |--- DurationOfPitch <= 20.00 | | | | | | | |--- TypeofContact_Self Enquiry <= 0.50 | | | | | | | | |--- CityTier <= 2.00 | | | | | | | | | |--- DurationOfPitch <= 10.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- DurationOfPitch > 10.00 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | |--- CityTier > 2.00 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- TypeofContact_Self Enquiry > 0.50 | | | | | | | | |--- DurationOfPitch <= 9.50 | | | | | | | | | |--- MonthlyIncome <= 17413.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 17413.50 | | | | | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- DurationOfPitch > 9.50 | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | |--- DurationOfPitch > 20.00 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | |--- MonthlyIncome > 21427.50 | | | | | |--- weights: [6.00, 0.00] class: 0 | | | |--- Occupation_Large Business > 0.50 | | | | |--- weights: [0.00, 9.00] class: 1 | |--- Age > 22.50 | | |--- PreferredPropertyStar <= 4.50 | | | |--- NumberOfFollowups <= 5.25 | | | | |--- Occupation_Large Business <= 0.50 | | | | | |--- MonthlyIncome <= 16559.00 | | | | | | |--- MaritalStatus_Single <= 0.50 | | | | | | | |--- PitchSatisfactionScore <= 1.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- PitchSatisfactionScore > 1.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- MaritalStatus_Single > 0.50 | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | |--- MonthlyIncome > 16559.00 | | | | | | |--- ProductPitched_Standard <= 0.50 | | | | | | | |--- MonthlyIncome <= 20161.50 | | | | | | | | |--- MonthlyIncome <= 20153.50 | | | | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | | | | |--- MonthlyIncome <= 19572.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- MonthlyIncome > 19572.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | | | | |--- Gender_Male <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- Gender_Male > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- MonthlyIncome > 20153.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- MonthlyIncome > 20161.50 | | | | | | | | |--- Designation_Executive <= 0.50 | | | | | | | | | |--- MonthlyIncome <= 23250.50 | | | | | | | | | | |--- ProductPitched_Super Deluxe <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- ProductPitched_Super Deluxe > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 23250.50 | | | | | | | | | | |--- MonthlyIncome <= 23568.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- MonthlyIncome > 23568.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- Designation_Executive > 0.50 | | | | | | | | | |--- MonthlyIncome <= 24233.00 | | | | | | | | | | |--- Age <= 32.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- Age > 32.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- MonthlyIncome > 24233.00 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- ProductPitched_Standard > 0.50 | | | | | | | |--- DurationOfPitch <= 15.50 | | | | | | | | |--- MonthlyIncome <= 21584.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- MonthlyIncome > 21584.50 | | | | | | | | | |--- MonthlyIncome <= 25668.00 | | | | | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- MonthlyIncome > 25668.00 | | | | | | | | | | |--- weights: [94.00, 0.00] class: 0 | | | | | | | |--- DurationOfPitch > 15.50 | | | | | | | | |--- Age <= 43.50 | | | | | | | | | |--- NumberOfTrips <= 6.50 | | | | | | | | | | |--- Age <= 31.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- Age > 31.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- NumberOfTrips > 6.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- Age > 43.50 | | | | | | | | | |--- NumberOfTrips <= 3.50 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | | | |--- NumberOfTrips > 3.50 | | | | | | | | | | |--- DurationOfPitch <= 17.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- DurationOfPitch > 17.50 | | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | |--- Occupation_Large Business > 0.50 | | | | | |--- Age <= 57.50 | | | | | | |--- Age <= 30.50 | | | | | | | |--- NumberOfTrips <= 5.50 | | | | | | | | |--- MaritalStatus_Single <= 0.50 | | | | | | | | | |--- DurationOfPitch <= 6.50 | | | | | | | | | | |--- MonthlyIncome <= 18069.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- MonthlyIncome > 18069.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- DurationOfPitch > 6.50 | | | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | | |--- MaritalStatus_Single > 0.50 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- NumberOfTrips > 5.50 | | | | | | | | |--- MonthlyIncome <= 22756.00 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | |--- MonthlyIncome > 22756.00 | | | | | | | | | |--- MonthlyIncome <= 23691.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- MonthlyIncome > 23691.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- Age > 30.50 | | | | | | | |--- MonthlyIncome <= 17322.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- MonthlyIncome > 17322.50 | | | | | | | | |--- MonthlyIncome <= 32290.38 | | | | | | | | | |--- Age <= 56.50 | | | | | | | | | | |--- CityTier <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- CityTier > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- Age > 56.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- MonthlyIncome > 32290.38 | | | | | | | | | |--- DurationOfPitch <= 14.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- DurationOfPitch > 14.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | |--- Age > 57.50 | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | |--- NumberOfFollowups > 5.25 | | | | |--- CityTier <= 1.50 | | | | | |--- NumberOfTrips <= 6.50 | | | | | | |--- MaritalStatus_Single <= 0.50 | | | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | | | |--- Designation_Executive <= 0.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- Designation_Executive > 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- MaritalStatus_Single > 0.50 | | | | | | | |--- PitchSatisfactionScore <= 3.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- PitchSatisfactionScore > 3.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- NumberOfTrips > 6.50 | | | | | | |--- Gender_Male <= 0.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- Gender_Male > 0.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- CityTier > 1.50 | | | | | |--- Designation_Manager <= 0.50 | | | | | | |--- NumberOfTrips <= 4.50 | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | |--- NumberOfTrips > 4.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- Designation_Manager > 0.50 | | | | | | |--- Age <= 43.00 | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | |--- Age > 43.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | |--- PreferredPropertyStar > 4.50 | | | |--- MaritalStatus_Single <= 0.50 | | | | |--- Age <= 46.50 | | | | | |--- MaritalStatus_Unmarried <= 0.50 | | | | | | |--- CityTier <= 1.50 | | | | | | | |--- NumberOfTrips <= 4.50 | | | | | | | | |--- DurationOfPitch <= 9.50 | | | | | | | | | |--- MonthlyIncome <= 21555.00 | | | | | | | | | | |--- MonthlyIncome <= 21202.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- MonthlyIncome > 21202.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 21555.00 | | | | | | | | | | |--- weights: [27.00, 0.00] class: 0 | | | | | | | | |--- DurationOfPitch > 9.50 | | | | | | | | | |--- weights: [82.00, 0.00] class: 0 | | | | | | | |--- NumberOfTrips > 4.50 | | | | | | | | |--- Age <= 41.50 | | | | | | | | | |--- Age <= 27.00 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- Age > 27.00 | | | | | | | | | | |--- MonthlyIncome <= 17150.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- MonthlyIncome > 17150.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- Age > 41.50 | | | | | | | | | |--- ProductPitched_Super Deluxe <= 0.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | |--- ProductPitched_Super Deluxe > 0.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- CityTier > 1.50 | | | | | | | |--- DurationOfPitch <= 15.50 | | | | | | | | |--- Age <= 43.50 | | | | | | | | | |--- Age <= 31.50 | | | | | | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- Age > 31.50 | | | | | | | | | | |--- weights: [40.00, 0.00] class: 0 | | | | | | | | |--- Age > 43.50 | | | | | | | | | |--- Age <= 45.50 | | | | | | | | | | |--- TypeofContact_Unknown <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | | |--- TypeofContact_Unknown > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- Age > 45.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- DurationOfPitch > 15.50 | | | | | | | | |--- DurationOfPitch <= 20.50 | | | | | | | | | |--- ProductPitched_Super Deluxe <= 0.50 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | |--- ProductPitched_Super Deluxe > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- DurationOfPitch > 20.50 | | | | | | | | | |--- NumberOfFollowups <= 2.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- NumberOfFollowups > 2.50 | | | | | | | | | | |--- Age <= 30.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- Age > 30.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | |--- MaritalStatus_Unmarried > 0.50 | | | | | | |--- DurationOfPitch <= 14.50 | | | | | | | |--- MonthlyIncome <= 21941.00 | | | | | | | | |--- MonthlyIncome <= 21636.00 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- MonthlyIncome > 21636.00 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- MonthlyIncome > 21941.00 | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | |--- DurationOfPitch > 14.50 | | | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | | | |--- MonthlyIncome <= 24403.50 | | | | | | | | | |--- DurationOfPitch <= 33.00 | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | | | |--- DurationOfPitch > 33.00 | | | | | | | | | | |--- Designation_Executive <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- Designation_Executive > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- MonthlyIncome > 24403.50 | | | | | | | | | |--- DurationOfPitch <= 27.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- DurationOfPitch > 27.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | |--- Age > 46.50 | | | | | |--- weights: [59.00, 0.00] class: 0 | | | |--- MaritalStatus_Single > 0.50 | | | | |--- DurationOfPitch <= 13.50 | | | | | |--- Age <= 30.50 | | | | | | |--- Age <= 28.50 | | | | | | | |--- OwnCar <= 0.50 | | | | | | | | |--- NumberOfPersonVisiting <= 2.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- NumberOfPersonVisiting > 2.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- OwnCar > 0.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- Age > 28.50 | | | | | | | |--- Occupation_Salaried <= 0.50 | | | | | | | | |--- Gender_Male <= 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- Gender_Male > 0.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- Occupation_Salaried > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- Age > 30.50 | | | | | | |--- DurationOfPitch <= 11.00 | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | | |--- DurationOfPitch > 11.00 | | | | | | | |--- NumberOfTrips <= 2.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- NumberOfTrips > 2.50 | | | | | | | | |--- ProductPitched_Standard <= 0.50 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- ProductPitched_Standard > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- DurationOfPitch > 13.50 | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | |--- Age <= 46.00 | | | | | | | |--- DurationOfPitch <= 32.00 | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | | |--- DurationOfPitch > 32.00 | | | | | | | | |--- Age <= 31.00 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- Age > 31.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- Age > 46.00 | | | | | | | |--- NumberOfTrips <= 3.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- NumberOfTrips > 3.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | |--- DurationOfPitch <= 18.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- DurationOfPitch > 18.50 | | | | | | | |--- weights: [5.00, 0.00] class: 0 |--- Passport > 0.50 | |--- Designation_Executive <= 0.50 | | |--- CityTier <= 2.50 | | | |--- DurationOfPitch <= 27.50 | | | | |--- Age <= 54.50 | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | |--- Age <= 27.50 | | | | | | | |--- Gender_Male <= 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- Gender_Male > 0.50 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | |--- Age > 27.50 | | | | | | | |--- weights: [193.00, 0.00] class: 0 | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | |--- NumberOfTrips <= 2.50 | | | | | | | |--- NumberOfChildrenVisiting <= 1.50 | | | | | | | | |--- PreferredPropertyStar <= 3.50 | | | | | | | | | |--- Age <= 35.00 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- Age > 35.00 | | | | | | | | | | |--- DurationOfPitch <= 14.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- DurationOfPitch > 14.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- PreferredPropertyStar > 3.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- NumberOfChildrenVisiting > 1.50 | | | | | | | | |--- MaritalStatus_Single <= 0.50 | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | |--- MaritalStatus_Single > 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- NumberOfTrips > 2.50 | | | | | | | |--- NumberOfFollowups <= 5.25 | | | | | | | | |--- weights: [25.00, 0.00] class: 0 | | | | | | | |--- NumberOfFollowups > 5.25 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- Age > 54.50 | | | | | |--- NumberOfTrips <= 5.50 | | | | | | |--- ProductPitched_Standard <= 0.50 | | | | | | | |--- Designation_VP <= 0.50 | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | |--- Designation_VP > 0.50 | | | | | | | | |--- PitchSatisfactionScore <= 2.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | |--- PitchSatisfactionScore > 2.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- ProductPitched_Standard > 0.50 | | | | | | | |--- MonthlyIncome <= 29210.00 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- MonthlyIncome > 29210.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- NumberOfTrips > 5.50 | | | | | | |--- MaritalStatus_Unmarried <= 0.50 | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | |--- MaritalStatus_Unmarried > 0.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- DurationOfPitch > 27.50 | | | | |--- Age <= 49.00 | | | | | |--- Age <= 46.50 | | | | | | |--- NumberOfTrips <= 1.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- NumberOfTrips > 1.50 | | | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | | | |--- MonthlyIncome <= 28409.50 | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | |--- MonthlyIncome > 28409.50 | | | | | | | | | |--- MonthlyIncome <= 28847.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 28847.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | | | |--- Gender_Male <= 0.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- Gender_Male > 0.50 | | | | | | | | | |--- Occupation_Large Business <= 0.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- Occupation_Large Business > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- Age > 46.50 | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | |--- Age > 49.00 | | | | | |--- weights: [17.00, 0.00] class: 0 | | |--- CityTier > 2.50 | | | |--- DurationOfPitch <= 17.50 | | | | |--- MaritalStatus_Married <= 0.50 | | | | | |--- MonthlyIncome <= 21828.50 | | | | | | |--- Occupation_Salaried <= 0.50 | | | | | | | |--- Age <= 26.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- Age > 26.50 | | | | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | | | | |--- Age <= 42.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- Age > 42.50 | | | | | | | | | | |--- Occupation_Small Business <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- Occupation_Small Business > 0.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- Occupation_Salaried > 0.50 | | | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | | | |--- MonthlyIncome <= 19523.00 | | | | | | | | | |--- PitchSatisfactionScore <= 2.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- PitchSatisfactionScore > 2.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- MonthlyIncome > 19523.00 | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- MonthlyIncome > 21828.50 | | | | | | |--- MaritalStatus_Unmarried <= 0.50 | | | | | | | |--- PitchSatisfactionScore <= 4.50 | | | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | | | |--- weights: [19.00, 0.00] class: 0 | | | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | | | |--- PreferredPropertyStar <= 3.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- PreferredPropertyStar > 3.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- PitchSatisfactionScore > 4.50 | | | | | | | | |--- Age <= 43.00 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- Age > 43.00 | | | | | | | | | |--- ProductPitched_Super Deluxe <= 0.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- ProductPitched_Super Deluxe > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- MaritalStatus_Unmarried > 0.50 | | | | | | | |--- DurationOfPitch <= 8.50 | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- DurationOfPitch > 8.50 | | | | | | | | |--- PreferredPropertyStar <= 3.50 | | | | | | | | | |--- Occupation_Small Business <= 0.50 | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | | |--- Occupation_Small Business > 0.50 | | | | | | | | | | |--- Age <= 37.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- Age > 37.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- PreferredPropertyStar > 3.50 | | | | | | | | | |--- DurationOfPitch <= 11.00 | | | | | | | | | | |--- Occupation_Small Business <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- Occupation_Small Business > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- DurationOfPitch > 11.00 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | |--- MaritalStatus_Married > 0.50 | | | | | |--- NumberOfFollowups <= 5.25 | | | | | | |--- MonthlyIncome <= 21815.00 | | | | | | | |--- MonthlyIncome <= 21745.00 | | | | | | | | |--- DurationOfPitch <= 9.50 | | | | | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- DurationOfPitch > 9.50 | | | | | | | | | |--- weights: [17.00, 0.00] class: 0 | | | | | | | |--- MonthlyIncome > 21745.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- MonthlyIncome > 21815.00 | | | | | | | |--- weights: [64.00, 0.00] class: 0 | | | | | |--- NumberOfFollowups > 5.25 | | | | | | |--- Age <= 35.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- Age > 35.50 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | |--- DurationOfPitch > 17.50 | | | | |--- MaritalStatus_Single <= 0.50 | | | | | |--- MonthlyIncome <= 28257.00 | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | |--- DurationOfPitch <= 29.50 | | | | | | | | |--- Age <= 34.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- Age > 34.50 | | | | | | | | | |--- TypeofContact_Self Enquiry <= 0.50 | | | | | | | | | | |--- ProductPitched_Standard <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- ProductPitched_Standard > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- TypeofContact_Self Enquiry > 0.50 | | | | | | | | | | |--- MonthlyIncome <= 21731.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- MonthlyIncome > 21731.00 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | |--- DurationOfPitch > 29.50 | | | | | | | | |--- NumberOfChildrenVisiting <= 2.50 | | | | | | | | | |--- Occupation_Large Business <= 0.50 | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | |--- Occupation_Large Business > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- NumberOfChildrenVisiting > 2.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | |--- MonthlyIncome > 28257.00 | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- MaritalStatus_Single > 0.50 | | | | | |--- MonthlyIncome <= 24568.00 | | | | | | |--- MonthlyIncome <= 20372.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- MonthlyIncome > 20372.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- MonthlyIncome > 24568.00 | | | | | | |--- weights: [0.00, 13.00] class: 1 | |--- Designation_Executive > 0.50 | | |--- NumberOfFollowups <= 2.50 | | | |--- TypeofContact_Self Enquiry <= 0.50 | | | | |--- NumberOfTrips <= 3.00 | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- NumberOfTrips > 3.00 | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- TypeofContact_Self Enquiry > 0.50 | | | | |--- MonthlyIncome <= 16907.00 | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- MonthlyIncome > 16907.00 | | | | | |--- NumberOfPersonVisiting <= 3.50 | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | |--- NumberOfPersonVisiting > 3.50 | | | | | | |--- MonthlyIncome <= 20887.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- MonthlyIncome > 20887.50 | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | |--- NumberOfFollowups > 2.50 | | | |--- DurationOfPitch <= 22.50 | | | | |--- Age <= 25.50 | | | | | |--- Occupation_Large Business <= 0.50 | | | | | | |--- PitchSatisfactionScore <= 1.50 | | | | | | | |--- Occupation_Salaried <= 0.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- Occupation_Salaried > 0.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- PitchSatisfactionScore > 1.50 | | | | | | | |--- MaritalStatus_Married <= 0.50 | | | | | | | | |--- weights: [0.00, 30.00] class: 1 | | | | | | | |--- MaritalStatus_Married > 0.50 | | | | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- Occupation_Large Business > 0.50 | | | | | | |--- CityTier <= 2.00 | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- CityTier > 2.00 | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | |--- Age > 25.50 | | | | | |--- NumberOfPersonVisiting <= 2.50 | | | | | | |--- NumberOfFollowups <= 3.50 | | | | | | | |--- MonthlyIncome <= 19708.00 | | | | | | | | |--- MonthlyIncome <= 18267.00 | | | | | | | | | |--- Age <= 47.50 | | | | | | | | | | |--- Age <= 26.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- Age > 26.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- Age > 47.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- MonthlyIncome > 18267.00 | | | | | | | | | |--- NumberOfTrips <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- NumberOfTrips > 1.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- MonthlyIncome > 19708.00 | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | |--- NumberOfFollowups > 3.50 | | | | | | | |--- NumberOfTrips <= 2.50 | | | | | | | | |--- Gender_Male <= 0.50 | | | | | | | | | |--- MonthlyIncome <= 17330.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 17330.50 | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | |--- Gender_Male > 0.50 | | | | | | | | | |--- MonthlyIncome <= 17693.00 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | |--- MonthlyIncome > 17693.00 | | | | | | | | | | |--- MonthlyIncome <= 20466.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- MonthlyIncome > 20466.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- NumberOfTrips > 2.50 | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | |--- NumberOfPersonVisiting > 2.50 | | | | | | |--- MaritalStatus_Single <= 0.50 | | | | | | | |--- MonthlyIncome <= 23624.50 | | | | | | | | |--- PreferredPropertyStar <= 4.50 | | | | | | | | | |--- Age <= 44.00 | | | | | | | | | | |--- PitchSatisfactionScore <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- PitchSatisfactionScore > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- Age > 44.00 | | | | | | | | | | |--- NumberOfTrips <= 4.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- NumberOfTrips > 4.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- PreferredPropertyStar > 4.50 | | | | | | | | | |--- Age <= 41.50 | | | | | | | | | | |--- DurationOfPitch <= 20.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- DurationOfPitch > 20.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- Age > 41.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | |--- MonthlyIncome > 23624.50 | | | | | | | | |--- NumberOfFollowups <= 4.50 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | |--- NumberOfFollowups > 4.50 | | | | | | | | | |--- Occupation_Salaried <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- Occupation_Salaried > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- MaritalStatus_Single > 0.50 | | | | | | | |--- DurationOfPitch <= 12.50 | | | | | | | | |--- Occupation_Small Business <= 0.50 | | | | | | | | | |--- DurationOfPitch <= 11.50 | | | | | | | | | | |--- PreferredPropertyStar <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- PreferredPropertyStar > 4.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- DurationOfPitch > 11.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- Occupation_Small Business > 0.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- DurationOfPitch > 12.50 | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | |--- DurationOfPitch > 22.50 | | | | |--- DurationOfPitch <= 32.50 | | | | | |--- NumberOfPersonVisiting <= 1.50 | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- NumberOfPersonVisiting > 1.50 | | | | | | |--- MonthlyIncome <= 22644.50 | | | | | | | |--- DurationOfPitch <= 23.50 | | | | | | | | |--- Age <= 24.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- Age > 24.50 | | | | | | | | | |--- NumberOfTrips <= 1.50 | | | | | | | | | | |--- CityTier <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- CityTier > 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- NumberOfTrips > 1.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- DurationOfPitch > 23.50 | | | | | | | | |--- weights: [0.00, 44.00] class: 1 | | | | | | |--- MonthlyIncome > 22644.50 | | | | | | | |--- NumberOfTrips <= 3.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- NumberOfTrips > 3.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- DurationOfPitch > 32.50 | | | | | |--- weights: [2.00, 0.00] class: 0
# importance of features in the tree building, is also known as Gini importance
print(
pd.DataFrame(
tree_cost_complex_model.feature_importances_,
columns=["Imp"],
index=x_train.columns,
).sort_values(by="Imp", ascending=False)
)
Imp Age 0.149965 DurationOfPitch 0.131285 MonthlyIncome 0.125814 NumberOfTrips 0.077517 Designation_Executive 0.069702 Passport 0.067175 PitchSatisfactionScore 0.063612 NumberOfFollowups 0.045043 CityTier 0.044654 MaritalStatus_Single 0.028679 Gender_Male 0.025558 PreferredPropertyStar 0.021807 Occupation_Large Business 0.018757 MaritalStatus_Unmarried 0.016953 NumberOfPersonVisiting 0.016942 Occupation_Salaried 0.015564 TypeofContact_Self Enquiry 0.012101 Occupation_Small Business 0.011596 MaritalStatus_Married 0.011523 NumberOfChildrenVisiting 0.009545 ProductPitched_Standard 0.009506 OwnCar 0.009157 ProductPitched_Super Deluxe 0.008014 Designation_Manager 0.005514 TypeofContact_Unknown 0.001640 ProductPitched_Deluxe 0.001531 Designation_VP 0.000844 ProductPitched_King 0.000000 Designation_Senior Manager 0.000000
importances = tree_cost_complex_model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# Models with default hyperparameters
default_hyperparameters
| bagging_metrics_train | bagging_metrics_test | random_forest_metrics_train | random_forest_metrics_test | tree_metrics_train | tree_metrics_test | |
|---|---|---|---|---|---|---|
| Accuracy | 0.994281 | 0.903860 | 1.0 | 0.910877 | 1.0 | 0.893333 |
| Recall | 0.971246 | 0.630597 | 1.0 | 0.597015 | 1.0 | 0.742537 |
| Precision | 0.998358 | 0.816425 | 1.0 | 0.893855 | 1.0 | 0.705674 |
| F1 | 0.984615 | 0.711579 | 1.0 | 0.715884 | 1.0 | 0.723636 |
# Models with tuned hyperparameters
tuned_hyperparameters = pd.concat(
[
bagging_tuned_metrics_train.T,
bagging_tuned_metrics_test.T,
bagging_weighted_metrics_train.T,
bagging_weighted_metrics_test.T,
rf_tuned_metrics_train.T,
rf_tuned_metrics_test.T,
rf_weighted_metrics_train.T,
rf_weighted_metrics_test.T,
tree_tuned_metrics_train.T,
tree_tuned_metrics_test.T,
tree_cost_complex_metrics_train.T,
tree_cost_complex_metrics_test.T,
],
axis=1,
)
tuned_hyperparameters.columns = [
"bagging_tuned_metrics_train",
"bagging_tuned_metrics_test",
"bagging_weighted_metrics_train",
"bagging_weighted_metrics_test",
"rf_tuned_metrics_train",
"rf_tuned_metrics_test",
"rf_weighted_metrics_train",
"rf_weighted_metrics_test",
"tree_tuned_metrics_train",
"tree_tuned_metrics_test",
"tree_cost_complex_metrics_train",
"tree_cost_complex_metrics_test",
]
tuned_hyperparameters
| bagging_tuned_metrics_train | bagging_tuned_metrics_test | bagging_weighted_metrics_train | bagging_weighted_metrics_test | rf_tuned_metrics_train | rf_tuned_metrics_test | rf_weighted_metrics_train | rf_weighted_metrics_test | tree_tuned_metrics_train | tree_tuned_metrics_test | tree_cost_complex_metrics_train | tree_cost_complex_metrics_test | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | 1.0 | 0.915088 | 0.994582 | 0.911579 | 0.917219 | 0.861754 | 0.894040 | 0.842105 | 0.829922 | 0.804211 | 1.0 | 0.893333 |
| Recall | 1.0 | 0.638060 | 0.972843 | 0.615672 | 0.595847 | 0.388060 | 0.691693 | 0.514925 | 0.597444 | 0.526119 | 1.0 | 0.742537 |
| Precision | 1.0 | 0.876923 | 0.998361 | 0.877660 | 0.944304 | 0.759124 | 0.731419 | 0.592275 | 0.544396 | 0.481229 | 1.0 | 0.705674 |
| F1 | 1.0 | 0.738661 | 0.985437 | 0.723684 | 0.730656 | 0.513580 | 0.711002 | 0.550898 | 0.569688 | 0.502674 | 1.0 | 0.723636 |
Bagging Classifier
Tuning Bagging Classifier with GridSearch
Tuning Bagging Classifier with weighted Decision Tree base estimator
Random Forest classifier
Tuning Random Forest classifier with GridSearch
Tuning Random Forest classifier with class_weights
Decision Tree Classifier
Tuning Decision Tree Classifier - Pre-pruning with GridSearch
Tuning Decision Tree - Post-pruning with Cost Complexity Analysis
---- Choose the best model ----
# Build the model
ada_boost = AdaBoostClassifier(random_state=1)
ada_boost.fit(x_train, y_train)
AdaBoostClassifier(random_state=1)
# Calculate the model metrics for training dataset
ada_boost_metrics_train = model_metrics(ada_boost, x_train, y_train)
ada_boost_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.8528 | 0.34984 | 0.727575 | 0.472492 |
# Confusion matrix for training dataset
confusion_matrix(ada_boost, x_train, y_train)
# Calculate the model metrics for testing dataset
ada_boost_metrics_test = model_metrics(ada_boost, x_test, y_test)
ada_boost_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.841404 | 0.30597 | 0.672131 | 0.420513 |
# Confusion matrix for testing dataset
confusion_matrix(ada_boost, x_test, y_test)
# Build the model
gradient_boost = GradientBoostingClassifier(random_state=1)
gradient_boost.fit(x_train, y_train)
GradientBoostingClassifier(random_state=1)
# Calculate the model metrics for training dataset
gradient_boost_metrics_train = model_metrics(gradient_boost, x_train, y_train)
gradient_boost_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.891331 | 0.490415 | 0.879656 | 0.629744 |
# Confusion matrix for training dataset
confusion_matrix(gradient_boost, x_train, y_train)
# Calculate the model metrics for testing dataset
gradient_boost_metrics_test = model_metrics(gradient_boost, x_test, y_test)
gradient_boost_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.860351 | 0.399254 | 0.737931 | 0.51816 |
# Confusion matrix for testing dataset
confusion_matrix(gradient_boost, x_test, y_test)
# Build the model
xg_boost = XGBClassifier(random_state=1, eval_metric="logloss")
xg_boost.fit(x_train, y_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, eval_metric='logloss',
gamma=0, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=4,
num_parallel_tree=1, random_state=1, reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, subsample=1, tree_method='exact',
validate_parameters=1, verbosity=None)
# Calculate the model metrics for training dataset
xg_boost_metrics_train = model_metrics(xg_boost, x_train, y_train)
xg_boost_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 1.0 | 1.0 | 1.0 | 1.0 |
# Confusion matrix for training dataset
confusion_matrix(xg_boost, x_train, y_train)
# Calculate the model metrics for testing dataset
xg_boost_metrics_test = model_metrics(xg_boost, x_test, y_test)
xg_boost_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.923509 | 0.686567 | 0.880383 | 0.771488 |
# Confusion matrix for testing dataset
confusion_matrix(xg_boost, x_test, y_test)
# Build the model
# Stack two best models from Bagging and Boosting
default_models = [
("Default Decision Tree", decision_tree),
("Default Gradient Boost", gradient_boost),
]
final_estimator = xg_boost
stacking_classifier = StackingClassifier(
estimators=default_models, final_estimator=final_estimator
)
stacking_classifier.fit(x_train, y_train)
StackingClassifier(estimators=[('Default Decision Tree',
DecisionTreeClassifier(random_state=1)),
('Default Gradient Boost',
GradientBoostingClassifier(random_state=1))],
final_estimator=XGBClassifier(base_score=0.5,
booster='gbtree',
colsample_bylevel=1,
colsample_bynode=1,
colsample_bytree=1,
eval_metric='logloss', gamma=0,
gpu_id=-1,
importance_type='gain',
interaction_constraints='',
learning_rate=0.300000012,
max_delta_step=0, max_depth=6,
min_child_weight=1,
missing=nan,
monotone_constraints='()',
n_estimators=100, n_jobs=4,
num_parallel_tree=1,
random_state=1, reg_alpha=0,
reg_lambda=1,
scale_pos_weight=1,
subsample=1,
tree_method='exact',
validate_parameters=1,
verbosity=None))
# Calculate the model metrics for training dataset
stacking_metrics_train = model_metrics(stacking_classifier, x_train, y_train)
stacking_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.939494 | 0.736422 | 0.927565 | 0.821015 |
# Confusion matrix for training dataset
confusion_matrix(stacking_classifier, x_train, y_train)
# Calculate the model metrics for testing dataset
stacking_metrics_test = model_metrics(stacking_classifier, x_test, y_test)
stacking_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.872982 | 0.552239 | 0.708134 | 0.620545 |
# Confusion matrix for testing dataset
confusion_matrix(stacking_classifier, x_test, y_test)
boosting_default_hyperparameters = pd.concat(
[
ada_boost_metrics_train.T,
ada_boost_metrics_test.T,
gradient_boost_metrics_train.T,
gradient_boost_metrics_test.T,
xg_boost_metrics_train.T,
xg_boost_metrics_test.T,
stacking_metrics_train.T,
stacking_metrics_test.T,
],
axis=1,
)
boosting_default_hyperparameters.columns = [
"ada_boost_metrics_train",
"ada_boost_metrics_test",
"gradient_boost_metrics_train",
"gradient_boost_metrics_test",
"xg_boost_metrics_train",
"xg_boost_metrics_test",
"stacking_metrics_train",
"stacking_metrics_test",
]
boosting_default_hyperparameters
| ada_boost_metrics_train | ada_boost_metrics_test | gradient_boost_metrics_train | gradient_boost_metrics_test | xg_boost_metrics_train | xg_boost_metrics_test | stacking_metrics_train | stacking_metrics_test | |
|---|---|---|---|---|---|---|---|---|
| Accuracy | 0.852800 | 0.841404 | 0.891331 | 0.860351 | 1.0 | 0.923509 | 0.939494 | 0.872982 |
| Recall | 0.349840 | 0.305970 | 0.490415 | 0.399254 | 1.0 | 0.686567 | 0.736422 | 0.552239 |
| Precision | 0.727575 | 0.672131 | 0.879656 | 0.737931 | 1.0 | 0.880383 | 0.927565 | 0.708134 |
| F1 | 0.472492 | 0.420513 | 0.629744 | 0.518160 | 1.0 | 0.771488 | 0.821015 | 0.620545 |
AdaBoost Classifier
Gradient Boosting Classifier
XGBoost Classifier
Stacking Classifier
# Build the model
ada_tuned = AdaBoostClassifier(random_state=1)
# Parameters for tuning
parameters = {
#Let's try different max_depth for base_estimator
"base_estimator":[DecisionTreeClassifier(max_depth=1),DecisionTreeClassifier(max_depth=2),
DecisionTreeClassifier(max_depth=3)],
"n_estimators": np.arange(10,110,10),
"learning_rate":np.arange(0.1,2,0.1)
}
# Type of scoring used to compare parameter combinations - We will choose Recall
recall_score = metrics.make_scorer(metrics.recall_score)
# Run the grid search
ada_grid_search = GridSearchCV(ada_tuned, parameters, scoring=recall_score, cv=5)
# Fit the best algorithm to the data.
ada_grid_search = ada_grid_search.fit(x_train, y_train)
# Set the grid search object to the best combination of parameters
ada_tuned = ada_grid_search.best_estimator_
ada_tuned
AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=3),
learning_rate=1.5000000000000002, n_estimators=100,
random_state=1)
# Calculate the model metrics for training dataset
ada_tuned_metrics_train = model_metrics(ada_tuned, x_train, y_train)
ada_tuned_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.991872 | 0.966454 | 0.99018 | 0.978173 |
# Confusion matrix for training dataset
confusion_matrix(ada_tuned, x_train, y_train)
# Calculate the model metrics for testing dataset
ada_tuned_metrics_test = model_metrics(ada_tuned, x_test, y_test)
ada_tuned_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.865263 | 0.578358 | 0.662393 | 0.61753 |
# Confusion matrix for testing dataset
confusion_matrix(ada_tuned, x_test, y_test)
# importance of features in the tree building
print(
pd.DataFrame(
ada_tuned.feature_importances_, columns=["Imp"], index=x_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp MonthlyIncome 0.285915 Age 0.140680 DurationOfPitch 0.139285 NumberOfTrips 0.055795 NumberOfFollowups 0.046324 PitchSatisfactionScore 0.035116 PreferredPropertyStar 0.032168 Passport 0.028942 CityTier 0.027838 TypeofContact_Self Enquiry 0.020596 MaritalStatus_Single 0.019427 Designation_Senior Manager 0.016381 NumberOfChildrenVisiting 0.016023 Designation_Executive 0.015965 Occupation_Large Business 0.014168 OwnCar 0.012562 Occupation_Salaried 0.012453 ProductPitched_Super Deluxe 0.012374 NumberOfPersonVisiting 0.011724 MaritalStatus_Unmarried 0.011212 Gender_Male 0.011182 ProductPitched_Deluxe 0.010339 MaritalStatus_Married 0.009170 Occupation_Small Business 0.006336 ProductPitched_Standard 0.004433 Designation_Manager 0.002265 ProductPitched_King 0.001327 TypeofContact_Unknown 0.000000 Designation_VP 0.000000
feature_names = x_train.columns
importances = ada_tuned.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# Build the model
gradient_tuned = GradientBoostingClassifier(
init=AdaBoostClassifier(random_state=1), random_state=1
)
# Parameters for tuning
parameters = {
"n_estimators": np.arange(50, 200, 25),
"subsample": [0.7, 0.8, 0.9, 1],
"max_features": [0.7, 0.8, 0.9, 1],
"max_depth": [3, 5, 7, 10],
}
# Type of scoring used to compare parameter combinations - We will choose Recall
recall_score = metrics.make_scorer(metrics.recall_score)
# Run the grid search
gradient_grid_search = GridSearchCV(
gradient_tuned, parameters, scoring=recall_score, cv=5
)
# Fit the best algorithm to the data.
gradient_grid_search = gradient_grid_search.fit(x_train, y_train)
# Set the grid search object to the best combination of parameters
gradient_tuned = gradient_grid_search.best_estimator_
gradient_tuned
GradientBoostingClassifier(init=AdaBoostClassifier(random_state=1),
max_depth=10, max_features=0.9, n_estimators=175,
random_state=1, subsample=0.9)
# Calculate the model metrics for training dataset
gradient_tuned_metrics_train = model_metrics(gradient_tuned, x_train, y_train)
gradient_tuned_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 1.0 | 1.0 | 1.0 | 1.0 |
# Confusion matrix for training dataset
confusion_matrix(gradient_tuned, x_train, y_train)
# Calculate the model metrics for testing dataset
gradient_tuned_metrics_test = model_metrics(gradient_tuned, x_test, y_test)
gradient_tuned_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.932632 | 0.716418 | 0.90566 | 0.8 |
# Confusion matrix for testing dataset
confusion_matrix(gradient_tuned, x_test, y_test)
# importance of features in the tree building
print(
pd.DataFrame(
gradient_tuned.feature_importances_, columns=["Imp"], index=x_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp Age 0.149108 MonthlyIncome 0.134276 DurationOfPitch 0.128983 Designation_Executive 0.066960 NumberOfTrips 0.064568 Passport 0.063312 PitchSatisfactionScore 0.063092 NumberOfFollowups 0.048993 CityTier 0.043177 PreferredPropertyStar 0.029664 MaritalStatus_Single 0.022848 Occupation_Large Business 0.019116 NumberOfPersonVisiting 0.018369 Gender_Male 0.017906 MaritalStatus_Married 0.017905 TypeofContact_Self Enquiry 0.017116 NumberOfChildrenVisiting 0.014720 Occupation_Small Business 0.014500 MaritalStatus_Unmarried 0.014228 OwnCar 0.011871 Occupation_Salaried 0.010586 ProductPitched_Deluxe 0.005632 Designation_Manager 0.005269 ProductPitched_Super Deluxe 0.005069 Designation_Senior Manager 0.004369 ProductPitched_Standard 0.004346 ProductPitched_King 0.001975 Designation_VP 0.001360 TypeofContact_Unknown 0.000679
feature_names = x_train.columns
importances = gradient_tuned.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# Build the model
xgboost_tuned = XGBClassifier(random_state=1, eval_metric="logloss")
# Parameters for tuning
parameters = {
"n_estimators": [10, 30, 50],
"scale_pos_weight": [1, 2, 5],
"subsample": [0.7, 0.9, 1],
"learning_rate": [0.05, 0.1, 0.2],
"colsample_bytree": [0.7, 0.9, 1],
"colsample_bylevel": [0.5, 0.7, 1],
}
# Type of scoring used to compare parameter combinations - We will choose Recall
recall_score = metrics.make_scorer(metrics.recall_score)
# Run the grid search
xgboost_grid_search = GridSearchCV(
xgboost_tuned, parameters, scoring=recall_score, cv=5
)
# Fit the best algorithm to the data.
xgboost_grid_search = xgboost_grid_search.fit(x_train, y_train)
# Set the grid search object to the best combination of parameters
xgboost_tuned = xgboost_grid_search.best_estimator_
xgboost_tuned
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=0.7,
colsample_bynode=1, colsample_bytree=0.7, eval_metric='logloss',
gamma=0, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.1, max_delta_step=0,
max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=50, n_jobs=4,
num_parallel_tree=1, random_state=1, reg_alpha=0, reg_lambda=1,
scale_pos_weight=5, subsample=0.9, tree_method='exact',
validate_parameters=1, verbosity=None)
# Calculate the model metrics for training dataset
xgboost_tuned_metrics_train = model_metrics(xgboost_tuned, x_train, y_train)
xgboost_tuned_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.93528 | 0.960064 | 0.759798 | 0.848271 |
# Confusion matrix for training dataset
confusion_matrix(xgboost_tuned, x_train, y_train)
# Calculate the model metrics for testing dataset
xgboost_tuned_metrics_test = model_metrics(xgboost_tuned, x_test, y_test)
xgboost_tuned_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.86807 | 0.768657 | 0.620482 | 0.686667 |
# Confusion matrix for testing dataset
confusion_matrix(xgboost_tuned, x_test, y_test)
# importance of features in the tree building
print(
pd.DataFrame(
xgboost_tuned.feature_importances_, columns=["Imp"], index=x_train.columns
).sort_values(by="Imp", ascending=False)
)
Imp Passport 0.160875 Designation_Executive 0.097580 ProductPitched_Standard 0.049446 MaritalStatus_Single 0.047653 MaritalStatus_Married 0.042021 CityTier 0.040125 Occupation_Large Business 0.037121 PreferredPropertyStar 0.036544 ProductPitched_Super Deluxe 0.031379 NumberOfFollowups 0.030871 ProductPitched_King 0.030581 Age 0.030482 PitchSatisfactionScore 0.028203 DurationOfPitch 0.027626 MonthlyIncome 0.027558 Designation_VP 0.026220 Designation_Manager 0.025742 NumberOfTrips 0.025350 MaritalStatus_Unmarried 0.025278 ProductPitched_Deluxe 0.025229 Occupation_Salaried 0.024384 Occupation_Small Business 0.023677 Gender_Male 0.021083 TypeofContact_Self Enquiry 0.019860 OwnCar 0.019105 NumberOfPersonVisiting 0.014308 Designation_Senior Manager 0.013823 NumberOfChildrenVisiting 0.010915 TypeofContact_Unknown 0.006961
feature_names = x_train.columns
importances = xgboost_tuned.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
# Build the model
# Stack two best models from Bagging and Boosting
tuned_models = [
("Tuned Bagging with GridSearch", bagging_tuned),
("Tuned Decision Tree with Cost-complexity", tree_cost_complex_model),
("Tuned Gradient Boost", gradient_tuned),
]
tuned_final_estimator = xgboost_tuned
tuned_stacking_classifier = StackingClassifier(
estimators=tuned_models, final_estimator=tuned_final_estimator
)
tuned_stacking_classifier.fit(x_train, y_train)
StackingClassifier(estimators=[('Tuned Bagging with GridSearch',
BaggingClassifier(max_features=0.9,
max_samples=0.9,
n_estimators=50,
random_state=1)),
('Tuned Decision Tree with Cost-complexity',
DecisionTreeClassifier(random_state=1)),
('Tuned Gradient Boost',
GradientBoostingClassifier(init=AdaBoostClassifier(random_state=1),
max_depth=10,
max_features=0.9,
n_esti...
eval_metric='logloss', gamma=0,
gpu_id=-1,
importance_type='gain',
interaction_constraints='',
learning_rate=0.1,
max_delta_step=0, max_depth=6,
min_child_weight=1,
missing=nan,
monotone_constraints='()',
n_estimators=50, n_jobs=4,
num_parallel_tree=1,
random_state=1, reg_alpha=0,
reg_lambda=1,
scale_pos_weight=5,
subsample=0.9,
tree_method='exact',
validate_parameters=1,
verbosity=None))
# Calculate the model metrics for training dataset
stacking_tuned_metrics_train = model_metrics(
tuned_stacking_classifier, x_train, y_train
)
stacking_tuned_metrics_train
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.999398 | 1.0 | 0.996815 | 0.998405 |
# Confusion matrix for training dataset
confusion_matrix(tuned_stacking_classifier, x_train, y_train)
# Calculate the model metrics for testing dataset
stacking_tuned_metrics_test = model_metrics(tuned_stacking_classifier, x_test, y_test)
stacking_tuned_metrics_test
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.898246 | 0.899254 | 0.671309 | 0.76874 |
# Confusion matrix for testing dataset
confusion_matrix(tuned_stacking_classifier, x_test, y_test)
boosting_tuned_hyperparameters = pd.concat(
[
ada_tuned_metrics_train.T,
ada_tuned_metrics_test.T,
gradient_tuned_metrics_train.T,
gradient_tuned_metrics_test.T,
xgboost_tuned_metrics_train.T,
xgboost_tuned_metrics_test.T,
stacking_tuned_metrics_train.T,
stacking_tuned_metrics_test.T,
],
axis=1,
)
boosting_tuned_hyperparameters.columns = [
"ada_tuned_metrics_train",
"ada_tuned_metrics_test",
"gradient_tuned_metrics_train",
"gradient_tuned_metrics_test",
"xgboost_tuned_metrics_train",
"xgboost_tuned_metrics_test",
"stacking_tuned_metrics_train",
"stacking_tuned_metrics_test",
]
boosting_tuned_hyperparameters
| ada_tuned_metrics_train | ada_tuned_metrics_test | gradient_tuned_metrics_train | gradient_tuned_metrics_test | xgboost_tuned_metrics_train | xgboost_tuned_metrics_test | stacking_tuned_metrics_train | stacking_tuned_metrics_test | |
|---|---|---|---|---|---|---|---|---|
| Accuracy | 0.991872 | 0.865263 | 1.0 | 0.932632 | 0.935280 | 0.868070 | 0.999398 | 0.898246 |
| Recall | 0.966454 | 0.578358 | 1.0 | 0.716418 | 0.960064 | 0.768657 | 1.000000 | 0.899254 |
| Precision | 0.990180 | 0.662393 | 1.0 | 0.905660 | 0.759798 | 0.620482 | 0.996815 | 0.671309 |
| F1 | 0.978173 | 0.617530 | 1.0 | 0.800000 | 0.848271 | 0.686667 | 0.998405 | 0.768740 |
AdaBoost Tuned with GridSearch
Gradient Boosting Tuned with GridSearch
XGBoost Tuned with GridSearch
Stacking Model Tuned with GridSearch
training_models_df = pd.concat(
[
bagging_metrics_train.T,
rf_metrics_train.T,
tree_metrics_train.T,
bagging_tuned_metrics_train.T,
bagging_weighted_metrics_train.T,
rf_tuned_metrics_train.T,
rf_weighted_metrics_train.T,
tree_tuned_metrics_train.T,
tree_cost_complex_metrics_train.T,
ada_boost_metrics_train.T,
gradient_boost_metrics_train.T,
xg_boost_metrics_train.T,
stacking_metrics_train.T,
ada_tuned_metrics_train.T,
gradient_tuned_metrics_train.T,
xgboost_tuned_metrics_train.T,
stacking_tuned_metrics_train.T,
],
axis=1,
)
training_models_df.columns = [
"bagging_metrics_train",
"rf_metrics_train",
"tree_metrics_train",
"bagging_tuned_metrics_train",
"bagging_weighted_metrics_train",
"rf_tuned_metrics_train",
"rf_weighted_metrics_train",
"tree_tuned_metrics_train",
"tree_cost_complex_metrics_train",
"ada_boost_metrics_train",
"gradient_boost_metrics_train",
"xg_boost_metrics_train",
"stacking_metrics_train",
"ada_tuned_metrics_train",
"gradient_tuned_metrics_train",
"xgboost_tuned_metrics_train",
"stacking_tuned_metrics_train",
]
training_models_df
| bagging_metrics_train | rf_metrics_train | tree_metrics_train | bagging_tuned_metrics_train | bagging_weighted_metrics_train | rf_tuned_metrics_train | rf_weighted_metrics_train | tree_tuned_metrics_train | tree_cost_complex_metrics_train | ada_boost_metrics_train | gradient_boost_metrics_train | xg_boost_metrics_train | stacking_metrics_train | ada_tuned_metrics_train | gradient_tuned_metrics_train | xgboost_tuned_metrics_train | stacking_tuned_metrics_train | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | 0.994281 | 1.0 | 1.0 | 1.0 | 0.994582 | 0.917219 | 0.894040 | 0.829922 | 1.0 | 0.852800 | 0.891331 | 1.0 | 0.939494 | 0.991872 | 1.0 | 0.935280 | 0.999398 |
| Recall | 0.971246 | 1.0 | 1.0 | 1.0 | 0.972843 | 0.595847 | 0.691693 | 0.597444 | 1.0 | 0.349840 | 0.490415 | 1.0 | 0.736422 | 0.966454 | 1.0 | 0.960064 | 1.000000 |
| Precision | 0.998358 | 1.0 | 1.0 | 1.0 | 0.998361 | 0.944304 | 0.731419 | 0.544396 | 1.0 | 0.727575 | 0.879656 | 1.0 | 0.927565 | 0.990180 | 1.0 | 0.759798 | 0.996815 |
| F1 | 0.984615 | 1.0 | 1.0 | 1.0 | 0.985437 | 0.730656 | 0.711002 | 0.569688 | 1.0 | 0.472492 | 0.629744 | 1.0 | 0.821015 | 0.978173 | 1.0 | 0.848271 | 0.998405 |
testing_models_df = pd.concat(
[
bagging_metrics_test.T,
rf_metrics_test.T,
tree_metrics_test.T,
bagging_tuned_metrics_test.T,
bagging_weighted_metrics_test.T,
rf_tuned_metrics_test.T,
rf_weighted_metrics_test.T,
tree_tuned_metrics_test.T,
tree_cost_complex_metrics_test.T,
ada_boost_metrics_test.T,
gradient_boost_metrics_test.T,
xg_boost_metrics_test.T,
stacking_metrics_test.T,
ada_tuned_metrics_test.T,
gradient_tuned_metrics_test.T,
xgboost_tuned_metrics_test.T,
stacking_tuned_metrics_test.T,
],
axis=1,
)
testing_models_df.columns = [
"bagging_metrics_test",
"rf_metrics_test",
"tree_metrics_test",
"bagging_tuned_metrics_test",
"bagging_weighted_metrics_test",
"rf_tuned_metrics_test",
"rf_weighted_metrics_test",
"tree_tuned_metrics_test",
"tree_cost_complex_metrics_test",
"ada_boost_metrics_test",
"gradient_boost_metrics_test",
"xg_boost_metrics_test",
"stacking_metrics_test",
"ada_tuned_metrics_test",
"gradient_tuned_metrics_test",
"xgboost_tuned_metrics_test",
"stacking_tuned_metrics_test",
]
testing_models_df
| bagging_metrics_test | rf_metrics_test | tree_metrics_test | bagging_tuned_metrics_test | bagging_weighted_metrics_test | rf_tuned_metrics_test | rf_weighted_metrics_test | tree_tuned_metrics_test | tree_cost_complex_metrics_test | ada_boost_metrics_test | gradient_boost_metrics_test | xg_boost_metrics_test | stacking_metrics_test | ada_tuned_metrics_test | gradient_tuned_metrics_test | xgboost_tuned_metrics_test | stacking_tuned_metrics_test | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy | 0.903860 | 0.910877 | 0.893333 | 0.915088 | 0.911579 | 0.861754 | 0.842105 | 0.804211 | 0.893333 | 0.841404 | 0.860351 | 0.923509 | 0.872982 | 0.865263 | 0.932632 | 0.868070 | 0.898246 |
| Recall | 0.630597 | 0.597015 | 0.742537 | 0.638060 | 0.615672 | 0.388060 | 0.514925 | 0.526119 | 0.742537 | 0.305970 | 0.399254 | 0.686567 | 0.552239 | 0.578358 | 0.716418 | 0.768657 | 0.899254 |
| Precision | 0.816425 | 0.893855 | 0.705674 | 0.876923 | 0.877660 | 0.759124 | 0.592275 | 0.481229 | 0.705674 | 0.672131 | 0.737931 | 0.880383 | 0.708134 | 0.662393 | 0.905660 | 0.620482 | 0.671309 |
| F1 | 0.711579 | 0.715884 | 0.723636 | 0.738661 | 0.723684 | 0.513580 | 0.550898 | 0.502674 | 0.723636 | 0.420513 | 0.518160 | 0.771488 | 0.620545 | 0.617530 | 0.800000 | 0.686667 | 0.768740 |
The type of customers likely to purchase the newly introduced travel packages are:
Age is another feature that the company can consider. The families with kids likely to purchase the travel package. The company can have more features for kid-friendly to attact more families.